This is an automated email from the ASF dual-hosted git repository.
tlopex pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new 45bef4579c [CUDA] Fix cuModuleUnload crash during interpreter shutdown
(#18624)
45bef4579c is described below
commit 45bef4579cc6411eff7fc3344b76ee0ce13d32e7
Author: Guan-Ming (Wesley) Chiu <[email protected]>
AuthorDate: Mon Dec 29 19:23:42 2025 +0800
[CUDA] Fix cuModuleUnload crash during interpreter shutdown (#18624)
## Related
#18614 ci error
## Why
The CUDAModuleNode destructor was using CUDA_DRIVER_CALL and CUDA_CALL
macros that call LOG(FATAL) (throw an exception) when CUDA operations
fail.
During interpreter shutdown, the CUDA context can become invalid,
causing CUDA_ERROR_ILLEGAL_ADDRESS when cuModuleUnload is called.
Throwing exceptions in destructors is undefined behavior and causes
crashes.
## How
1. Removed the throwing macros from the destructor
2. Check cudaSetDevice return value and skip cleanup if it fails
3. Ignore errors from cuModuleUnload - during shutdown these are benign
since the OS will reclaim resources anyway
---
src/runtime/cuda/cuda_module.cc | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/runtime/cuda/cuda_module.cc b/src/runtime/cuda/cuda_module.cc
index f07996c68b..19f4288c97 100644
--- a/src/runtime/cuda/cuda_module.cc
+++ b/src/runtime/cuda/cuda_module.cc
@@ -60,8 +60,13 @@ class CUDAModuleNode : public ffi::ModuleObj {
~CUDAModuleNode() {
for (size_t i = 0; i < module_.size(); ++i) {
if (module_[i] != nullptr) {
- CUDA_CALL(cudaSetDevice(static_cast<int>(i)));
- CUDA_DRIVER_CALL(cuModuleUnload(module_[i]));
+ cudaError_t set_err = cudaSetDevice(static_cast<int>(i));
+ if (set_err != cudaSuccess && set_err != cudaErrorCudartUnloading) {
+ continue;
+ }
+ CUresult result = cuModuleUnload(module_[i]);
+ // Ignore errors during cleanup - context may be shutting down
+ (void)result;
}
}
}