happyme531 opened a new issue, #17063:
URL: https://github.com/apache/tvm/issues/17063

   As the title said, when I use TVM MetaSchdule and RPC to run tuning on 
another device, when I resize the terminal of the host tuning proccess, the RPC 
runner process on the host will immediately segfault.
   
   ### Expected behavior
   
   TVM won't segfault.
   
   ### Actual behavior
   
   ```log
   !!!!!!! TVM encountered a Segfault !!!!!!!
   Stack trace:
     0: tvm::runtime::(anonymous namespace)::backtrace_handler(int)
           at /home/zt/rk3588-nn/tvm/src/runtime/logging.cc:214
     1: 0x00007f935548cadf
     2: tvm::runtime::EnvCAPIRegistry::CheckSignals()
           at /home/zt/rk3588-nn/tvm/src/runtime/registry.cc:186
     3: long 
tvm::support::RetryCallOnEINTR<tvm::support::TCPSocket::Recv(void*, unsigned 
long, int)::{lambda()#1}, int (*)()>(tvm::support::TCPSocket::Recv(void*, 
unsigned long, int)::{lambda()#1}, int (*)())
           at 
/home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/errno_handling.h:58
     4: tvm::support::TCPSocket::Recv(void*, unsigned long, int)
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/socket.h:481
     5: tvm::runtime::SockChannel::Recv(void*, unsigned long)
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_socket_impl.cc:56
     6: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, 
std::function<void (tvm::runtime::TVMArgs)>)::$_1::operator()(void*, unsigned 
long) const
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:705
     7: unsigned long 
tvm::support::RingBuffer::WriteWithCallback<tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
 std::function<void 
(tvm::runtime::TVMArgs)>)::$_1>(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
 std::function<void (tvm::runtime::TVMArgs)>)::$_1, unsigned long)
           at 
/home/zt/rk3588-nn/tvm/src/runtime/rpc/../../support/ring_buffer.h:174
     8: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, 
std::function<void (tvm::runtime::TVMArgs)>)
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:704
     9: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, 
int, std::function<void (tvm::runtime::TVMArgs)>)
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:870
     10: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int 
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_endpoint.cc:1087
     11: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
           at /home/zt/rk3588-nn/tvm/src/runtime/rpc/rpc_module.cc:129
   ```
   
   ### Environment
   
   Host:
   Manjaro Linux 24.0.1
   TVM master branch 78a1f80bf24f1a1114f2ed7d17563d267bb38cc9
   
   Device: 
   RK3588 ARM SoC
   Debian 11
   TVM master branch 78a1f80bf24f1a1114f2ed7d17563d267bb38cc9
   
   ### Steps to reproduce
   
   ```python
   # %%
   import tvm
   from tvm import relay
   from tvm import relax
   from tvm.relax.frontend.onnx import from_onnx
   from tvm.relax.testing import relay_translator
   from tvm.driver.tvmc.transform import apply_graph_transforms
   import onnx
   import tvm.testing
   import tvm.topi.testing
   from tvm.ir.module import IRModule
   from tvm import meta_schedule as ms
   import tvm.tir.tensor_intrin.arm_cpu 
   from tvm.meta_schedule.runner import (
       EvaluatorConfig,
       LocalRunner,
       PyRunner,
       RPCConfig,
       RPCRunner,
   )
   
   # %%
   target = tvm.target.Target("llvm -mtriple=aarch64-linux-gnu -mcpu=cortex-a76 
-num-cores=1")
   onnx_model_path = "yolov5s.onnx" !
   shape_dict = {"images": (1, 3, 640, 640)}
   
   # %%
   onnx_model = onnx.load(onnx_model_path)
   mod0, params = relay.frontend.from_onnx(onnx_model, shape_dict)
   mod: IRModule = relay_translator.from_relay(mod0["main"], target, params)
   mod = apply_graph_transforms(
       mod,
       {
           "mixed_precision": True,
           "mixed_precision_calculation_type": "float16",
           "mixed_precision_acc_type": "float16",
       },
   )
   rpc_config = RPCConfig(
       tracker_host="127.0.0.1",
       tracker_port=9190,
       tracker_key="rk3588", 
       session_priority=1,
       session_timeout_sec=10,
   )
   evaluator_config = EvaluatorConfig(
       number=1,
       repeat=1,
       min_repeat_ms=5,
       enable_cpu_cache_flush=True,
   )
   runner = RPCRunner(rpc_config, evaluator_config)
   database = ms.relax_integration.tune_relax(
       mod=mod,
       params=params,
       target=target,
       max_trials_global=10000,  # larger value for better performance, but 
take longer time to search
       runner=runner,
       work_dir="./work2",
       seed=0
   )
   
   # %%
   # Compile the best schedule
   lib = ms.relay_integration.compile_relay(
       database=database,
       mod=mod,
       params=params,
       target=target,
   )
   
   # %%
   import tvm.driver.tvmc.model as tvmc_model
   model = tvmc_model.TVMCModel(mod, params)
   model.export_package(lib, onnx_model_path.replace(".onnx", ".tar"), 
"aarch64-linux-gnu-gcc") 
   ```
   
   ### Triage
   
   * core:rpc
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to