[I] [Bug] Direct function calls fail when using VM over RPC [tvm]

via GitHub Wed, 02 Jul 2025 21:05:14 -0700


w1049 opened a new issue, #18108:
URL: https://github.com/apache/tvm/issues/18108


   ### Expected behavior
   
   The VM should work correctly for both direct calls and `set_input` + 
`invoke_stateful`, whether running locally or over RPC.
   
   ### Actual behavior
   
   When using VM over RPC, direct calls fail while `set_input` + 
`invoke_stateful` works fine.
   
   ### Environment
   
   commit 956b65910c475789be4ea5ec0a39a81af99bcfd8
   
   ### Steps to reproduce
   Execute the script:
   ```python
   import numpy as np
   
   import tvm
   from tvm import rpc
   from tvm.contrib import utils
   from tvm import relax
   
   from tvm.script import relax as R
   
   
   # A simple Relax function
   @R.function
   def relax_matmul(
       data: R.Tensor(("n", 784), dtype="float32"),
       w0: R.Tensor((784, 128), dtype="float32"),
   ) -> R.Tensor(("n", 128), dtype="float32"):
       with R.dataflow():
           lv0 = R.matmul(data, w0)
           R.output(lv0)
       return lv0
   
   
   mod = tvm.IRModule.from_expr(relax_matmul)
   
   target = "llvm"
   ex = tvm.compile(mod, target=target)
   
   # ===== Local Execution =====
   dev = tvm.cpu()
   vm = relax.VirtualMachine(ex, dev)
   
   data = tvm.nd.array(np.random.rand(1024, 784).astype("float32"), dev)
   w0 = tvm.nd.array(np.random.rand(784, 128).astype("float32"), dev)
   
   vm.set_input("relax_matmul", data, w0)
   vm.invoke_stateful("relax_matmul")
   out = vm.get_outputs("relax_matmul")
   print("Invoke stateful:", out.shape)
   
   out = vm["relax_matmul"](data, w0)
   print("Direct:", out.shape)
   
   # ===== RPC Execution =====
   temp = utils.tempdir()
   path = temp.relpath("lib.tar")
   ex.export_library(path)
   
   remote = rpc.LocalSession()
   remote.upload(path)
   rt_mod = remote.load_module("lib.tar")
   
   dev = remote.cpu()
   vm = relax.VirtualMachine(rt_mod, dev)
   
   data = tvm.nd.array(np.random.rand(1024, 784).astype("float32"), dev)
   w0 = tvm.nd.array(np.random.rand(784, 128).astype("float32"), dev)
   
   vm.set_input("relax_matmul", data, w0)
   vm.invoke_stateful("relax_matmul")
   out = vm.get_outputs("relax_matmul")
   print("RPC Invoke stateful:", out.shape)
   
   out = vm["relax_matmul"](data, w0)
   print("RPC Direct:", out.shape)
   ```
   
   The output is:
   ```
   Invoke stateful: (1024, 128)
   Direct: (1024, 128)
   2025-07-03 10:56:13.950 INFO load_module /tmp/tmpbr579u0v/lib.tar
   RPC Invoke stateful: (1024, 128)
   Exception caught during TVMFFIGetTypeInfo:
   Traceback (most recent call last):
     File "<unknown>", in 
__pyx_pw_3tvm_3ffi_4core_8Function_1__call__(_object*, _object* const*, long, 
_object*)
     File "<unknown>", in __pyx_f_3tvm_3ffi_4core_FuncCall(void*, _object*, 
TVMFFIAny*, int*) [clone .constprop.0]
     File "tvm/src/runtime/rpc/rpc_module.cc", line 132, in 
tvm::runtime::RPCWrappedFunc::operator()(tvm::ffi::PackedArgs, tvm::ffi::Any*) 
const
     File "tvm/src/runtime/rpc/rpc_local_session.cc", line 102, in 
tvm::runtime::LocalSession::CallFunc(void*, tvm::ffi::PackedArgs, 
std::function<void (tvm::ffi::PackedArgs)> const&)
     File "tvm/src/runtime/vm/vm.cc", line 545, in 
tvm::runtime::vm::VirtualMachineImpl::InvokeClosurePacked(tvm::ffi::ObjectRef 
const&, tvm::ffi::PackedArgs, tvm::ffi::Any*)
     File "tvm/src/runtime/vm/vm.cc", line 618, in operator()
     File "tvm/src/runtime/vm/vm.cc", line 689, in 
tvm::runtime::vm::VirtualMachineImpl::InvokeBytecode(long, 
std::vector<tvm::ffi::Any, std::allocator<tvm::ffi::Any> > const&)
     File "tvm/src/runtime/vm/vm.cc", line 812, in 
tvm::runtime::vm::VirtualMachineImpl::RunLoop()
     File "tvm/src/runtime/vm/vm.cc", line 763, in 
tvm::runtime::vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::vm::VMFrame*, 
tvm::runtime::vm::Instruction)
     File "tvm/src/runtime/vm/builtin.cc", line 121, in 
tvm::runtime::vm::MatchShape(tvm::ffi::PackedArgs, tvm::ffi::Any*)
     File "tvm/ffi/include/tvm/ffi/type_traits.h", line 93, in 
tvm::ffi::TypeTraitsBase::GetMismatchTypeInfo[abi:cxx11](TVMFFIAny const*)
     File "tvm/ffi/include/tvm/ffi/object.h", line 71, in 
tvm::ffi::TypeIndexToTypeKey[abi:cxx11](int)
     File "tvm/ffi/src/ffi/object.cc", line 455, in TVMFFIGetTypeInfo
     File "tvm/ffi/src/ffi/object.cc", line 187, in 
tvm::ffi::TypeTable::GetTypeEntry(int)
   InternalError: Check failed: (entry != nullptr) is false: Cannot find type 
info for type_index=7
   ```
   ### Analysis
   
   - Without RPC: Inputs are `NDArray` → Works fine
   - With RPC: Direct calls convert arrays to `DLTensor*` → Causes this error
   
   This conversion appears intentional: 
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/python/tvm/runtime/vm.py#L255-L258
   
   The error occurs because in `src/runtime/vm/builtin.cc`, the `MatchShape` 
function attempts to cast `args[0]` to either `NDArray` or `ffi::Shape`.
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/builtin.cc#L115-L122
   
   When directly calling `vm["relax_matmul"](data, w0)` over RPC, `args[0]` is 
actually a `DLTensor*` (`kTVMFFIDLTensorPtr = 7`). The cast fails when trying 
to convert from `DLTensor*` to `ffi::Shape`. Then `TVMFFIGetTypeInfo` is called 
to get the type name, but `DLTensor*` isn't registered in 
`ffi/src/ffi/object.cc`.
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/ffi/include/tvm/ffi/any.h#L135-L143
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/ffi/src/ffi/object.cc#L277-L304
   
   Unclear if this is expected behavior - `DLTensor*` is the only static type 
not registered (missing corresponding `StaticTypeKey`).
   
   `SetInput` uses `ConvertArgToDevice` to convert `DLTensor*` to `NDArray`, so 
`set_input` + `invoke_stateful` avoids this error.
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/vm.cc#L510-L515
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/vm.cc#L113-L131
   
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/include/tvm/runtime/memory/memory_manager.h#L62-L70
   
   ### Triage
   
   * needs-triage
   * core:ffi
   * core:rpc
   * core:object
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Direct function calls fail when using VM over RPC [tvm]

Reply via email to