joelnn commented on issue #19379: URL: https://github.com/apache/incubator-mxnet/issues/19379#issuecomment-1145325082
This is more important now that #19378 was reverted. I am seeing the segfault described in #19360 ``` (gdb) bt #0 0x00007efccaa7d277 in raise () from /lib64/libc.so.6 #1 0x00007efccaa7e968 in abort () from /lib64/libc.so.6 #2 0x00007efccaabfd37 in __libc_message () from /lib64/libc.so.6 #3 0x00007efccaac8499 in _int_free () from /lib64/libc.so.6 #4 0x00007efccaa80c00 in __run_exit_handlers () from /lib64/libc.so.6 #5 0x00007efccaa80c27 in exit () from /lib64/libc.so.6 #6 0x00007efc5531808d in ?? () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #7 <signal handler called> #8 0x00007efbc1836061 in ?? () from /opt/apps/cudnn/8.2.4_cuda10.2/lib64/libcudnn_ops_infer.so.8 #9 0x00007efbc1861c00 in ?? () from /opt/apps/cudnn/8.2.4_cuda10.2/lib64/libcudnn_ops_infer.so.8 #10 0x00007efbc0fa7edf in cudnnDestroy () from /opt/apps/cudnn/8.2.4_cuda10.2/lib64/libcudnn_ops_infer.so.8 #11 0x00007efc552619b6 in mshadow::Stream<mshadow::gpu>::DestroyDnnHandle() () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #12 0x00007efc55261b78 in void mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*) () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #13 0x00007efc55276607 in void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&) () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #14 0x00007efc5527683e in std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>) () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #15 0x00007efc5527381b in std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run() () from <snip>/mxnet/lib/python3.9/site-packages/mxnet/libmxnet.so #16 0x00007efc84cba2bd in std::execute_native_thread_routine_compat (__p=<optimized out>) at /home/builder/ktietz/cos6/ci_cos6/ctng-compilers_1622658800915/work/.build/x86_64-conda-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:94 #17 0x00007efccae1be25 in start_thread () from /lib64/libpthread.so.0 #18 0x00007efccab45bad in clone () from /lib64/libc.so.6 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org