I compiled MXNet from branch 1.6.x with non-standard CUDA.

I had to put

    #define THRUST_IGNORE_CUB_VERSION_CHECK 1

in multiple /src/ directory files to silence thrust library errors (due to 
version mismatch with CUDA).

Now I (successfully) build python library. Training is fine. Now, when I load 
model from disk, I do

    model.bind(...)
    model.set_params(arg_params, aux_params)
    ...
    model.predict(...)

and inference is fine again. But when process finished I get stacktrace:

    Segmentation fault: 11
    
    
    Segmentation fault: 11
    
    Stack trace:
      [bt] (0) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39)
 [0x7fabd6dacf39]
      [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
      [bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541) 
[0x7faaf9c89541]
      [bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f) 
[0x7faaf9ca910f]
      [bt] (4) 
/usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f) 
[0x7faaf88de72f]
      [bt] (5) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
 mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116) 
[0x7fabd6cb1c56]
      [bt] (6) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
 
mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context,
 bool, 
mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*,
 std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
      [bt] (7) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void
 (std::shared_ptr<dmlc::ManualEvent>), 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#4}::operator()() 
const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data
 const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
      [bt] (8) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void
 (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > 
>::_M_run()+0x45) [0x7fabd6cc6095]
    Stack trace:
      [bt] (0) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0x18c3f39)
 [0x7fabd6dacf39]
      [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fac02bff210]
      [bt] (2) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15a7541) 
[0x7faaf9c89541]
      [bt] (3) /usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(+0x15c710f) 
[0x7faaf9ca910f]
      [bt] (4) 
/usr/local/cuda-11.0/lib64/libcudnn_ops_infer.so.8(cudnnDestroy+0x8f) 
[0x7faaf88de72f]
      [bt] (5) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
 mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0x116) 
[0x7fabd6cb1c56]
      [bt] (6) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(void
 
mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context,
 bool, 
mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*,
 std::shared_ptr<dmlc::ManualEvent> const&)+0x287) [0x7fabd6ccb007]
      [bt] (7) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void
 (std::shared_ptr<dmlc::ManualEvent>), 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#4}::operator()() 
const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data
 const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x44) [0x7fabd6ccb3c4]
      [bt] (8) 
/home/emil/Downloads/sources/incubator-mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void
 (std::shared_ptr<dmlc::ManualEvent>)>, std::shared_ptr<dmlc::ManualEvent> > > 
>::_M_run()+0x45) [0x7fabd6cc6095]
    Segmentation fault (core dumped)





---
[Visit 
Topic](https://discuss.mxnet.io/t/segmentationfault-on-process-exit-with-cuda-11-0-3/6538/1)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.mxnet.io/email/unsubscribe/da268a3c053a9c2724536e5d4a78e1f096a8d227abc1bd551e084fb629b40aec).

Reply via email to