MehdiTantaoui-99 opened a new issue, #17495:
URL: https://github.com/apache/tvm/issues/17495
I ran tuning on an ONNX file using python and `tvmc` API, but after reaching
half of the tasks it throws an error which stops the tuning and makes you start
from the beginning (happened multiple times)
```python
# Perform actual tuning with selected tasks
tvmc.tune(
model,
target=target,
tuning_records=tuning_records,
enable_autoscheduler=args.enable_autoscheduler,
trials=args.tuning_trials,
early_stopping=args.early_stopping,
timeout=20,
)
print("Tuning completed.")
```
```
----------------------------------------------------------------------
| ID | Task Description |
Latency (ms) | Speed (GFLOPS) | Trials |
-----------------------------------------------------------------------------------------------------------------
| 0 | vm_mod_fused_nn_conv2d_add |
0.012 | 652.45 | 18 |
| 1 | vm_mod_fused_nn_conv2d_add_nn_relu_5 |
0.084 | 3351.34 | 18 |
| 2 | vm_mod_fused_nn_conv2d_add_add_3 |
0.028 | 4974.26 | 18 |
| 3 | vm_mod_fused_nn_conv2d_add_nn_relu_1 |
0.169 | 4028.20 | 18 |
| 4 | vm_mod_fused_nn_conv2d_add_add_nn_relu |
0.304 | 5958.98 | 18 |
| 5 | vm_mod_fused_nn_conv2d_add_add_nn_relu_5 |
0.129 | 3509.89 | 18 |
| 6 | vm_mod_fused_nn_conv2d_add_nn_relu_8 |
0.124 | 1992.50 | 18 |
| 7 | vm_mod_fused_nn_conv2d_add_add_1 |
0.087 | 3123.05 | 18 |
| 8 | vm_mod_fused_nn_conv2d_add_add_nn_relu_3 |
0.255 | 4438.36 | 18 |
| 9 | vm_mod_fused_nn_conv2d_add_nn_relu_4 |
0.267 | 5502.94 | 18 |
| 10 | vm_mod_fused_nn_conv2d_add_add_nn_relu_7 |
0.082 | 3001.29 | 18 |
| 11 | vm_mod_fused_nn_conv2d_add_add_nn_relu_1 |
0.426 | 5669.90 | 18 |
| 12 | vm_mod_fused_nn_conv2d_add_add_6 |
0.023 | 2781.69 | 18 |
| 13 | vm_mod_fused_nn_conv2d_add_nn_relu |
0.170 | 5459.73 | 18 |
| 14 | vm_mod_fused_nn_conv2d_add_nn_relu_7 |
0.165 | 3657.21 | 18 |
| 15 | vm_mod_fused_nn_conv2d_add_add |
- | - | 0 |
| 16 | vm_mod_fused_nn_conv2d_add_add_4 |
- | - | 0 |
| 17 | vm_mod_fused_nn_conv2d_add_nn_relu_3 |
- | - | 0 |
| 18 | vm_mod_fused_nn_conv2d_add_add_nn_relu_6 |
- | - | 0 |
| 19 | vm_mod_fused_nn_conv2d_add_add_2 |
- | - | 0 |
| 20 | vm_mod_fused_nn_conv2d_add_add_nn_relu_4 |
- | - | 0 |
| 21 | vm_mod_fused_nn_conv2d_add_nn_relu_6 |
- | - | 0 |
| 22 | vm_mod_fused_nn_conv2d_add_sigmoid |
- | - | 0 |
| 23 | vm_mod_fused_nn_conv2d_add_nn_relu_2 |
- | - | 0 |
| 24 | vm_mod_fused_nn_conv2d_add_add_nn_relu_2 |
- | - | 0 |
| 25 | vm_mod_fused_nn_conv2d_add_add_7 |
- | - | 0 |
| 26 | vm_mod_fused_nn_conv2d_add_add_5 |
- | - | 0 |
-----------------------------------------------------------------------------------------------------------------
```
### Expected behavior
To complete all tasks for tuning
### Actual behavior
We get an error:
```
terminate called after throwing an instance of 'tvm::runtime::InternalError'
what(): [13:54:11]
/home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312: InternalError: Check
failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA:
misaligned address
Stack trace:
0: tvm::runtime::CUDATimerNode::~CUDATimerNode()
at /home/ubuntu/tvm/src/runtime/cuda/cuda_device_api.cc:312
1:
tvm::runtime::SimpleObjAllocator::Handler<tvm::runtime::CUDATimerNode>::Deleter_(tvm::runtime::Object*)
at /home/ubuntu/tvm/include/tvm/runtime/memory.h:138
2: tvm::runtime::ObjectPtr<tvm::runtime::Object>::reset()
at /home/ubuntu/tvm/include/tvm/runtime/object.h:455
3: tvm::runtime::ObjectPtr<tvm::runtime::Object>::~ObjectPtr()
at /home/ubuntu/tvm/include/tvm/runtime/object.h:404
4: tvm::runtime::ObjectRef::~ObjectRef()
at /home/ubuntu/tvm/include/tvm/runtime/object.h:519
5: tvm::runtime::Timer::~Timer()
at /home/ubuntu/tvm/include/tvm/runtime/profiling.h:86
6: operator()
at /home/ubuntu/tvm/src/runtime/profiling.cc:915
7: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_local_session.cc:107
8: tvm::runtime::RPCSession::AsyncCallFunc(void*, TVMValue const*, int
const*, int, std::function<void (tvm::runtime::RPCCode, tvm::runtime::TVMArgs)>)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_session.cc:47
9: tvm::runtime::RPCEndpoint::EventHandler::HandleNormalCallFunc()
at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:542
10:
tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void
(tvm::runtime::TVMArgs)>)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:362
11: tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool,
std::function<void (tvm::runtime::TVMArgs)>)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:136
12: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool,
std::function<void (tvm::runtime::TVMArgs)>)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:714
13: tvm::runtime::RPCEndpoint::ServerLoop()
at /home/ubuntu/tvm/src/runtime/rpc/rpc_endpoint.cc:805
14: tvm::runtime::RPCServerLoop(int)
at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:119
15: operator()
at /home/ubuntu/tvm/src/runtime/rpc/rpc_socket_impl.cc:138
Exception in thread Thread-1 (_listen_loop):
Traceback (most recent call last):
File
"/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line
1045, in _bootstrap_inner
self.run()
File
"/home/ubuntu/miniconda3/envs/tvm-build-venv/lib/python3.11/threading.py", line
982, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 279, in _listen_loop
_serving(conn, addr, opts, load_library)
File "/home/ubuntu/tvm/python/tvm/rpc/server.py", line 168, in _serving
raise RuntimeError(
RuntimeError: Child process 49293 exited unsuccessfully with error code -6
```
### Environment
```
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
```
```
tvm version 0.19.dev0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]