happyme531 opened a new issue, #17063:
URL: https://github.com/apache/tvm/issues/17063
As the title said, when I use TVM MetaSchdule and RPC to run tuning on
another device, when I resize the terminal of the host tuning proccess, the RPC
runner process on the host will immediately segfault.
### Expected behavior
TVM won't segfault.
### Actual behavior
```log
!!!!!!! TVM encountered a Segfault !!!!!!!
Stack trace:
0: tvm::runtime::(anonymous namespace)::backtrace_handler(int)
at /home/zt/rk3588-nn/tvm/src/runtime/logging.cc:214
1: 0x00007f935548cadf
2: tvm::runtime::EnvCAPIRegistry::CheckSignals()
at /home/zt/rk3588-nn/tvm/src/runtime/registry.cc:186
3: long
tvm::support::RetryCallOnEINTR<tvm::support::TCPSocket::Recv(void*, unsigned
long, int)::{lambda()#1}, int (*)()>(tvm::support::TCPSocket::Recv(void*,
unsigned long, int)::{lambda()#1}, int (*)())
at
/home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/errno_handling.h:58
4: tvm::support::TCPSocket::Recv(void*, unsigned long, int)
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/socket.h:481
5: tvm::runtime::SockChannel::Recv(void*, unsigned long)
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_socket_impl.cc:56
6: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
std::function<void (tvm::runtime::TVMArgs)>)::$_1::operator()(void*, unsigned
long) const
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:705
7: unsigned long
tvm::support::RingBuffer::WriteWithCallback<tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
std::function<void
(tvm::runtime::TVMArgs)>)::$_1>(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
std::function<void (tvm::runtime::TVMArgs)>)::$_1, unsigned long)
at
/home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/ring_buffer.h:174
8: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
std::function<void (tvm::runtime::TVMArgs)>)
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:704
9: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*,
int, std::function<void (tvm::runtime::TVMArgs)>)
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:870
10: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:1087
11: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs,
tvm::runtime::TVMRetValue*) const
at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_module.cc:129
```
### Environment
Host:
Manjaro Linux 24.0.1
TVM master branch 78a1f80bf24f1a1114f2ed7d17563d267bb38cc9
Device:
RK3588 ARM SoC
Debian 11
TVM master branch 78a1f80bf24f1a1114f2ed7d17563d267bb38cc9
### Steps to reproduce
```python
# %%
import tvm
from tvm import relay
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx
from tvm.relax.testing import relay_translator
from tvm.driver.tvmc.transform import apply_graph_transforms
import onnx
import tvm.testing
import tvm.topi.testing
from tvm.ir.module import IRModule
from tvm import meta_schedule as ms
import tvm.tir.tensor_intrin.arm_cpu
from tvm.meta_schedule.runner import (
EvaluatorConfig,
LocalRunner,
PyRunner,
RPCConfig,
RPCRunner,
)
# %%
target = tvm.target.Target("llvm -mtriple=aarch64-linux-gnu -mcpu=cortex-a76
-num-cores=1")
onnx_model_path = "yolov5s.onnx" !
shape_dict = {"images": (1, 3, 640, 640)}
# %%
onnx_model = onnx.load(onnx_model_path)
mod0, params = relay.frontend.from_onnx(onnx_model, shape_dict)
mod: IRModule = relay_translator.from_relay(mod0["main"], target, params)
mod = apply_graph_transforms(
mod,
{
"mixed_precision": True,
"mixed_precision_calculation_type": "float16",
"mixed_precision_acc_type": "float16",
},
)
rpc_config = RPCConfig(
tracker_host="127.0.0.1",
tracker_port=9190,
tracker_key="rk3588",
session_priority=1,
session_timeout_sec=10,
)
evaluator_config = EvaluatorConfig(
number=1,
repeat=1,
min_repeat_ms=5,
enable_cpu_cache_flush=True,
)
runner = RPCRunner(rpc_config, evaluator_config)
database = ms.relax_integration.tune_relax(
mod=mod,
params=params,
target=target,
max_trials_global=10000, # larger value for better performance, but
take longer time to search
runner=runner,
work_dir="./work2",
seed=0
)
# %%
# Compile the best schedule
lib = ms.relay_integration.compile_relay(
database=database,
mod=mod,
params=params,
target=target,
)
# %%
import tvm.driver.tvmc.model as tvmc_model
model = tvmc_model.TVMCModel(mod, params)
model.export_package(lib, onnx_model_path.replace(".onnx", ".tar"),
"aarch64-linux-gnu-gcc")
```
### Triage
* core:rpc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]