w1049 opened a new issue, #18108:
URL: https://github.com/apache/tvm/issues/18108
### Expected behavior
The VM should work correctly for both direct calls and `set_input` +
`invoke_stateful`, whether running locally or over RPC.
### Actual behavior
When using VM over RPC, direct calls fail while `set_input` +
`invoke_stateful` works fine.
### Environment
commit 956b65910c475789be4ea5ec0a39a81af99bcfd8
### Steps to reproduce
Execute the script:
```python
import numpy as np
import tvm
from tvm import rpc
from tvm.contrib import utils
from tvm import relax
from tvm.script import relax as R
# A simple Relax function
@R.function
def relax_matmul(
data: R.Tensor(("n", 784), dtype="float32"),
w0: R.Tensor((784, 128), dtype="float32"),
) -> R.Tensor(("n", 128), dtype="float32"):
with R.dataflow():
lv0 = R.matmul(data, w0)
R.output(lv0)
return lv0
mod = tvm.IRModule.from_expr(relax_matmul)
target = "llvm"
ex = tvm.compile(mod, target=target)
# ===== Local Execution =====
dev = tvm.cpu()
vm = relax.VirtualMachine(ex, dev)
data = tvm.nd.array(np.random.rand(1024, 784).astype("float32"), dev)
w0 = tvm.nd.array(np.random.rand(784, 128).astype("float32"), dev)
vm.set_input("relax_matmul", data, w0)
vm.invoke_stateful("relax_matmul")
out = vm.get_outputs("relax_matmul")
print("Invoke stateful:", out.shape)
out = vm["relax_matmul"](data, w0)
print("Direct:", out.shape)
# ===== RPC Execution =====
temp = utils.tempdir()
path = temp.relpath("lib.tar")
ex.export_library(path)
remote = rpc.LocalSession()
remote.upload(path)
rt_mod = remote.load_module("lib.tar")
dev = remote.cpu()
vm = relax.VirtualMachine(rt_mod, dev)
data = tvm.nd.array(np.random.rand(1024, 784).astype("float32"), dev)
w0 = tvm.nd.array(np.random.rand(784, 128).astype("float32"), dev)
vm.set_input("relax_matmul", data, w0)
vm.invoke_stateful("relax_matmul")
out = vm.get_outputs("relax_matmul")
print("RPC Invoke stateful:", out.shape)
out = vm["relax_matmul"](data, w0)
print("RPC Direct:", out.shape)
```
The output is:
```
Invoke stateful: (1024, 128)
Direct: (1024, 128)
2025-07-03 10:56:13.950 INFO load_module /tmp/tmpbr579u0v/lib.tar
RPC Invoke stateful: (1024, 128)
Exception caught during TVMFFIGetTypeInfo:
Traceback (most recent call last):
File "<unknown>", in
__pyx_pw_3tvm_3ffi_4core_8Function_1__call__(_object*, _object* const*, long,
_object*)
File "<unknown>", in __pyx_f_3tvm_3ffi_4core_FuncCall(void*, _object*,
TVMFFIAny*, int*) [clone .constprop.0]
File "tvm/src/runtime/rpc/rpc_module.cc", line 132, in
tvm::runtime::RPCWrappedFunc::operator()(tvm::ffi::PackedArgs, tvm::ffi::Any*)
const
File "tvm/src/runtime/rpc/rpc_local_session.cc", line 102, in
tvm::runtime::LocalSession::CallFunc(void*, tvm::ffi::PackedArgs,
std::function<void (tvm::ffi::PackedArgs)> const&)
File "tvm/src/runtime/vm/vm.cc", line 545, in
tvm::runtime::vm::VirtualMachineImpl::InvokeClosurePacked(tvm::ffi::ObjectRef
const&, tvm::ffi::PackedArgs, tvm::ffi::Any*)
File "tvm/src/runtime/vm/vm.cc", line 618, in operator()
File "tvm/src/runtime/vm/vm.cc", line 689, in
tvm::runtime::vm::VirtualMachineImpl::InvokeBytecode(long,
std::vector<tvm::ffi::Any, std::allocator<tvm::ffi::Any> > const&)
File "tvm/src/runtime/vm/vm.cc", line 812, in
tvm::runtime::vm::VirtualMachineImpl::RunLoop()
File "tvm/src/runtime/vm/vm.cc", line 763, in
tvm::runtime::vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::vm::VMFrame*,
tvm::runtime::vm::Instruction)
File "tvm/src/runtime/vm/builtin.cc", line 121, in
tvm::runtime::vm::MatchShape(tvm::ffi::PackedArgs, tvm::ffi::Any*)
File "tvm/ffi/include/tvm/ffi/type_traits.h", line 93, in
tvm::ffi::TypeTraitsBase::GetMismatchTypeInfo[abi:cxx11](TVMFFIAny const*)
File "tvm/ffi/include/tvm/ffi/object.h", line 71, in
tvm::ffi::TypeIndexToTypeKey[abi:cxx11](int)
File "tvm/ffi/src/ffi/object.cc", line 455, in TVMFFIGetTypeInfo
File "tvm/ffi/src/ffi/object.cc", line 187, in
tvm::ffi::TypeTable::GetTypeEntry(int)
InternalError: Check failed: (entry != nullptr) is false: Cannot find type
info for type_index=7
```
### Analysis
- Without RPC: Inputs are `NDArray` → Works fine
- With RPC: Direct calls convert arrays to `DLTensor*` → Causes this error
This conversion appears intentional:
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/python/tvm/runtime/vm.py#L255-L258
The error occurs because in `src/runtime/vm/builtin.cc`, the `MatchShape`
function attempts to cast `args[0]` to either `NDArray` or `ffi::Shape`.
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/builtin.cc#L115-L122
When directly calling `vm["relax_matmul"](data, w0)` over RPC, `args[0]` is
actually a `DLTensor*` (`kTVMFFIDLTensorPtr = 7`). The cast fails when trying
to convert from `DLTensor*` to `ffi::Shape`. Then `TVMFFIGetTypeInfo` is called
to get the type name, but `DLTensor*` isn't registered in
`ffi/src/ffi/object.cc`.
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/ffi/include/tvm/ffi/any.h#L135-L143
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/ffi/src/ffi/object.cc#L277-L304
Unclear if this is expected behavior - `DLTensor*` is the only static type
not registered (missing corresponding `StaticTypeKey`).
`SetInput` uses `ConvertArgToDevice` to convert `DLTensor*` to `NDArray`, so
`set_input` + `invoke_stateful` avoids this error.
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/vm.cc#L510-L515
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/src/runtime/vm/vm.cc#L113-L131
https://github.com/apache/tvm/blob/956b65910c475789be4ea5ec0a39a81af99bcfd8/include/tvm/runtime/memory/memory_manager.h#L62-L70
### Triage
* needs-triage
* core:ffi
* core:rpc
* core:object
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]