MehdiTantaoui-99 opened a new issue, #17495:
URL: https://github.com/apache/tvm/issues/17495

   I ran tuning on an ONNX file using python and `tvmc` API, but after reaching 
half of the tasks it throws an error which stops the tuning and makes you start 
from the beginning (happened multiple times)
   
   ```python
    # Perform actual tuning with selected tasks
   tvmc.tune(
       model,
       target=target,
       tuning_records=tuning_records,
       enable_autoscheduler=args.enable_autoscheduler,
       trials=args.tuning_trials,
       early_stopping=args.early_stopping,
       timeout=20,
   )
   print("Tuning completed.")
   ```
   ```
   ----------------------------------------------------------------------
   |  ID  |                       Task Description                        | 
Latency (ms) | Speed (GFLOPS) | Trials |
   
-----------------------------------------------------------------------------------------------------------------
   |    0 |                                    vm_mod_fused_nn_conv2d_add |     
   0.012 |         652.45 |     18 |
   |    1 |                          vm_mod_fused_nn_conv2d_add_nn_relu_5 |     
   0.084 |        3351.34 |     18 |
   |    2 |                              vm_mod_fused_nn_conv2d_add_add_3 |     
   0.028 |        4974.26 |     18 |
   |    3 |                          vm_mod_fused_nn_conv2d_add_nn_relu_1 |     
   0.169 |        4028.20 |     18 |
   |    4 |                        vm_mod_fused_nn_conv2d_add_add_nn_relu |     
   0.304 |        5958.98 |     18 |
   |    5 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_5 |     
   0.129 |        3509.89 |     18 |
   |    6 |                          vm_mod_fused_nn_conv2d_add_nn_relu_8 |     
   0.124 |        1992.50 |     18 |
   |    7 |                              vm_mod_fused_nn_conv2d_add_add_1 |     
   0.087 |        3123.05 |     18 |
   |    8 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_3 |     
   0.255 |        4438.36 |     18 |
   |    9 |                          vm_mod_fused_nn_conv2d_add_nn_relu_4 |     
   0.267 |        5502.94 |     18 |
   |   10 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_7 |     
   0.082 |        3001.29 |     18 |
   |   11 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_1 |     
   0.426 |        5669.90 |     18 |
   |   12 |                              vm_mod_fused_nn_conv2d_add_add_6 |     
   0.023 |        2781.69 |     18 |
   |   13 |                            vm_mod_fused_nn_conv2d_add_nn_relu |     
   0.170 |        5459.73 |     18 |
   |   14 |                          vm_mod_fused_nn_conv2d_add_nn_relu_7 |     
   0.165 |        3657.21 |     18 |
   |   15 |                                vm_mod_fused_nn_conv2d_add_add |     
       - |              - |      0 |
   |   16 |                              vm_mod_fused_nn_conv2d_add_add_4 |     
       - |              - |      0 |
   |   17 |                          vm_mod_fused_nn_conv2d_add_nn_relu_3 |     
       - |              - |      0 |
   |   18 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_6 |     
       - |              - |      0 |
   |   19 |                              vm_mod_fused_nn_conv2d_add_add_2 |     
       - |              - |      0 |
   |   20 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_4 |     
       - |              - |      0 |
   |   21 |                          vm_mod_fused_nn_conv2d_add_nn_relu_6 |     
       - |              - |      0 |
   |   22 |                            vm_mod_fused_nn_conv2d_add_sigmoid |     
       - |              - |      0 |
   |   23 |                          vm_mod_fused_nn_conv2d_add_nn_relu_2 |     
       - |              - |      0 |
   |   24 |                      vm_mod_fused_nn_conv2d_add_add_nn_relu_2 |     
       - |              - |      0 |
   |   25 |                              vm_mod_fused_nn_conv2d_add_add_7 |     
       - |              - |      0 |
   |   26 |                              vm_mod_fused_nn_conv2d_add_add_5 |     
       - |              - |      0 |
   
-----------------------------------------------------------------------------------------------------------------
   ```
   ### Expected behavior
   
   To complete all tasks for tuning
   
   ### Actual behavior
   
   We get an error: 
   ```
   terminate called after throwing an instance of 'tvm::runtime::InternalError'
     what():  [13:54:11] 
/home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312: InternalError: Check 
failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: 
misaligned address
   Stack trace:
     0: tvm::runtime::CUDATimerNode::~CUDATimerNode()
           at /home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312
     1: 
tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::CUDATimerNode>::Deleter_(tvm::runtime::Object*)
           at /home/ubuntu/tvm/include/tvm/runtime/memory.h:138
     2: tvm::runtime::ObjectPtr<tvm::runtime::Object>::reset()
           at /home/ubuntu/tvm/include/tvm/runtime/object.h:455
     3: tvm::runtime::ObjectPtr<tvm::runtime::Object>::~ObjectPtr()
           at /home/ubuntu/tvm/include/tvm/runtime/object.h:404
     4: tvm::runtime::ObjectRef::~ObjectRef()
           at /home/ubuntu/tvm/include/tvm/runtime/object.h:519
     5: tvm::runtime::Timer::~Timer()
           at /home/ubuntu/tvm/include/tvm/runtime/profiling.h:86
     6: operator()
           at /home/ubuntu/tvm/src/runtime/profiling.cc:915
     7: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int 
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_local_session.cc:107
     8: tvm::runtime::RPCSession::AsyncCallFunc(void*, TVMValue const*, int 
const*, int, std::function<void (tvm::runtime::RPCCode, tvm::runtime::TVMArgs)>)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_session.cc:47
     9: tvm::runtime::RPCEndpoint::EventHandler::HandleNormalCallFunc()
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:542
     10: 
tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void 
(tvm::runtime::TVMArgs)>)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:362
     11: tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, 
std::function<void (tvm::runtime::TVMArgs)>)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:136
     12: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, 
std::function<void (tvm::runtime::TVMArgs)>)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:714
     13: tvm::runtime::RPCEndpoint::ServerLoop()
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:805
     14: tvm::runtime::RPCServerLoop(int)
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:119
     15: operator()
           at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:138
   
   Exception in thread Thread-1 (_listen_loop):
   Traceback (most recent call last):
     File 
"/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line 
1045, in _bootstrap_inner
       self.run()
     File 
"/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line 
982, in run
       self._target(*self._args, **self._kwargs)
     File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 279, in _listen_loop
       _serving(conn, addr, opts, load_library)
     File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 168, in _serving
       raise RuntimeError(
   RuntimeError: Child process 49293 exited unsuccessfully with error code -6
   ```
   
   ### Environment
   
   ```
   PRETTY_NAME="Ubuntu 22.04.5 LTS"
   NAME="Ubuntu"
   VERSION_ID="22.04"
   VERSION="22.04.5 LTS (Jammy Jellyfish)"
   VERSION_CODENAME=jammy
   ID=ubuntu
   ID_LIKE=debian
   HOME_URL="https://www.ubuntu.com/";
   SUPPORT_URL="https://help.ubuntu.com/";
   BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
   
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
   UBUNTU_CODENAME=jammy
   ```
   
   ```
   tvm version 0.19.dev0
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to