haojin2 opened a new issue #16943: Failing Test: test_contrib_amp.test_amp_conversion URL: https://github.com/apache/incubator-mxnet/issues/16943 Happening multiple times across different platforms and PRs: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-16870/9/pipeline/356 http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-16788/20/pipeline ``` test_contrib_amp.test_amp_conversion ... [11:22:18] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [11:22:18] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! [11:22:21] src/base.cc:80: cuDNN lib mismatch: linked-against version 7501 != compiled-against version 7600. Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning. terminate called after throwing an instance of 'dmlc::Error' what(): [11:22:23] /work/mxnet/3rdparty/mshadow/mshadow/./stream_gpu-inl.h:107: Check failed: err == CUBLAS_STATUS_SUCCESS (7 vs. 0) : Destory cublas handle failed Stack trace: [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f724907dff2] [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mshadow::Stream<mshadow::gpu>::DestroyBlasHandle()+0x10f) [0x7f724cafc8ff] [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(void mshadow::DeleteStream<mshadow::gpu>(mshadow::Stream<mshadow::gpu>*)+0xb7) [0x7f724cafd227] [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mshadow::Stream<mshadow::gpu>* mshadow::NewStream<mshadow::gpu>(bool, bool, int)+0x313) [0x7f724cafd873] [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::GPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, bool, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x18f) [0x7f724cb229df] [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#4}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x4e) [0x7f724cb22c1e] [bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f724cb0848a] [bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f72bbb44c80] [bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f72c33fb6ba] /work/runtime_functions.sh: line 1114: 146 Aborted (core dumped) nosetests-3.4 $NOSE_COVERAGE_ARGUMENTS $NOSE_TIMER_ARGUMENTS --with-xunit --xunit-file nosetests_gpu.xml --verbose tests/python/gpu ``` @ptrendx Any insights?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
