George-Polya opened a new issue, #17732:
URL: https://github.com/apache/tvm/issues/17732

   
   
   ### Expected behavior
   
   call the global function `vm.builtin.paged_attention_kv_cache_popn`
   
   
   ### Actual behavior
   
   ```
   01:06:07 | INFO | loading NVILA-Lite-2B from 
/data/models/mlc/dist/NVILA-Lite-2B/ctx32768/NVILA-Lite-2B-q4f16_ft/NVILA-Lite-2B-q4f16_ft-cuda.so
   Traceback (most recent call last):
     File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
       exec(code, run_globals)
     File "/opt/NanoLLM/nano_llm/vision/video.py", line 44, in <module>
       model = NanoLLM.from_pretrained(
     File "/opt/NanoLLM/nano_llm/nano_llm.py", line 91, in from_pretrained
       model = MLCModel(model_path, **kwargs)
     File "/opt/NanoLLM/nano_llm/models/mlc.py", line 173, in __init__
       self._kv_cache_pop = 
tvm.get_global_func('vm.builtin.paged_attention_kv_cache_popn')  # 
'vm.builtin.kv_state_popn')
     File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/registry.py", line 
235, in get_global_func
       return _get_global_func(name, allow_missing)
     File "tvm/_ffi/_cython/./packed_func.pxi", line 352, in 
tvm._ffi._cy3.core._get_global_func
   ValueError: Cannot find global function 
vm.builtin.paged_attention_kv_cache_popn
   ```
   If I use tvm==0.15.0, I don't see this error, but I see 
[vm.builtin.paged_attention_kv_cache_attention_with_fused_qkv 
error](https://github.com/mlc-ai/mlc-llm/issues/2018)
   
   ### Environment
   
   
    - Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
    - Operating system (e.g. Ubuntu/Windows/MacOS/...): Jetpack 6.1(Ubuntu 
22.04)
    - Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): Jetson Orin AGX
    - How you installed MLC-LLM (`conda`, source): source, mlc-llm==0.19.0
    - How you installed TVM-Unity (`pip`, source): pip / tvm==0.19.0
    - Python version (e.g. 3.10): 3.10
    - GPU driver version (if applicable): 540.4.0
    - CUDA/cuDNN version (if applicable):  12.6
    - TVM Unity Hash Tag (`python -c "import tvm; print('\n'.join(f'{k}: {v}' 
for k, v in tvm.support.libinfo().items()))"`, 
   ```
   applicable if you compile models):
   USE_NVTX: OFF
   USE_GTEST: OFF
   SUMMARIZE: ON
   TVM_DEBUG_WITH_ABI_CHANGE: OFF
   USE_IOS_RPC: OFF
   USE_MSC: OFF
   USE_ETHOSU: 
   CUDA_VERSION: 12.6
   USE_LIBBACKTRACE: OFF
   DLPACK_PATH: 3rdparty/dlpack/include
   USE_TENSORRT_CODEGEN: OFF
   USE_THRUST: ON
   USE_TARGET_ONNX: OFF
   USE_AOT_EXECUTOR: OFF
   BUILD_DUMMY_LIBTVM: OFF
   USE_CUDNN: ON
   USE_TENSORRT_RUNTIME: OFF
   USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
   USE_CCACHE: ON
   USE_ARM_COMPUTE_LIB: OFF
   USE_CPP_RTVM: 
   USE_OPENCL_GTEST: /path/to/opencl/gtest
   TVM_LOG_BEFORE_THROW: OFF
   USE_MKL: OFF
   USE_PT_TVMDSOOP: OFF
   MLIR_VERSION: NOT-FOUND
   USE_CLML: OFF
   USE_STACKVM_RUNTIME: ON
   USE_GRAPH_EXECUTOR_CUDA_GRAPH: ON
   ROCM_PATH: /opt/rocm
   USE_DNNL: OFF
   USE_MSCCL: OFF
   USE_NNAPI_RUNTIME: OFF
   USE_VITIS_AI: OFF
   USE_MLIR: OFF
   USE_RCCL: OFF
   USE_LLVM: /usr/bin/llvm-config --link-static
   USE_VERILATOR: OFF
   USE_TF_TVMDSOOP: OFF
   USE_THREADS: ON
   USE_MSVC_MT: OFF
   BACKTRACE_ON_SEGFAULT: OFF
   USE_GRAPH_EXECUTOR: ON
   USE_NCCL: OFF
   USE_ROCBLAS: OFF
   GIT_COMMIT_HASH: 3f30919055d864af3dd03c42b3cb0a878aa2cc25
   USE_VULKAN: OFF
   USE_RUST_EXT: OFF
   USE_CUTLASS: ON
   USE_CPP_RPC: OFF
   USE_HEXAGON: OFF
   USE_CUSTOM_LOGGING: OFF
   USE_UMA: OFF
   USE_FALLBACK_STL_MAP: OFF
   USE_SORT: ON
   USE_RTTI: ON
   GIT_COMMIT_TIME: 2024-09-26 09:52:46 -0400
   USE_HIPBLAS: OFF
   USE_HEXAGON_SDK: /path/to/sdk
   USE_BLAS: none
   USE_ETHOSN: OFF
   USE_LIBTORCH: OFF
   USE_RANDOM: ON
   USE_CUDA: ON
   USE_COREML: OFF
   USE_AMX: OFF
   BUILD_STATIC_RUNTIME: OFF
   USE_CMSISNN: OFF
   USE_KHRONOS_SPIRV: OFF
   USE_CLML_GRAPH_EXECUTOR: OFF
   USE_TFLITE: OFF
   USE_HEXAGON_GTEST: /path/to/hexagon/gtest
   PICOJSON_PATH: 3rdparty/picojson
   USE_OPENCL_ENABLE_HOST_PTR: OFF
   INSTALL_DEV: OFF
   USE_PROFILER: OFF
   USE_NNPACK: OFF
   LLVM_VERSION: 17.0.6
   USE_MRVL: OFF
   USE_OPENCL: OFF
   COMPILER_RT_PATH: 3rdparty/compiler-rt
   USE_NNAPI_CODEGEN: OFF
   RANG_PATH: 3rdparty/rang/include
   USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
   USE_OPENMP: OFF
   USE_BNNS: OFF
   USE_FLASHINFER: 
   USE_CUBLAS: ON
   USE_METAL: OFF
   USE_MICRO_STANDALONE_RUNTIME: OFF
   USE_HEXAGON_EXTERNAL_LIBS: OFF
   USE_ALTERNATIVE_LINKER: AUTO
   USE_BYODT_POSIT: OFF
   USE_NVSHMEM: OFF
   USE_HEXAGON_RPC: OFF
   USE_MICRO: OFF
   DMLC_PATH: 3rdparty/dmlc-core/include
   INDEX_DEFAULT_I64: ONa
   USE_RELAY_DEBUG: OFF
   USE_RPC: OFF
   USE_TENSORFLOW_PATH: none
   TVM_CLML_VERSION: 
   USE_MIOPEN: OFF
   USE_ROCM: OFF
   USE_PAPI: OFF
   USE_CURAND: ON
   TVM_CXX_COMPILER_PATH: /usr/bin/c++
   HIDE_PRIVATE_SYMBOLS: ON
   ```
   
   ### Steps to reproduce
   
   1. call `tvm.get_global_func('vm.builtin.paged_attention_kv_cache_popn')`
   
   ### Triage
   
   Please refer to the list of label tags 
[here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the 
relevant tags and add them below in a bullet format (example below).
   
   * needs-triage
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to