mseth10 opened a new issue #18564: URL: https://github.com/apache/incubator-mxnet/issues/18564
## Description `test_gpu_memory_profiler_gluon` fails intermittently for different cu* flavors in nightly CD pipelines. ## Occurrences 1. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1257/pipeline 2. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1245/pipeline ## Error log ``` [2020-06-14T15:23:01.268Z] ________________________ test_gpu_memory_profiler_gluon ________________________ [2020-06-14T15:23:01.268Z] [gw1] linux -- Python 3.6.9 /opt/rh/rh-python36/root/usr/bin/python3 [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] @pytest.mark.skipif(mx.context.num_gpus() == 0, reason="GPU memory profiler records allocation on GPUs only") [2020-06-14T15:23:01.268Z] def test_gpu_memory_profiler_gluon(): [2020-06-14T15:23:01.268Z] enable_profiler(profile_filename='test_profiler.json', [2020-06-14T15:23:01.268Z] > run=True, continuous_dump=True) [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:537: [2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:40: in enable_profiler [2020-06-14T15:23:01.268Z] aggregate_stats=aggregate_stats) [2020-06-14T15:23:01.268Z] python/mxnet/profiler.py:69: in set_config [2020-06-14T15:23:01.268Z] profiler_kvstore_handle)) [2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] ret = -1 [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] def check_call(ret): [2020-06-14T15:23:01.268Z] """Check the return value of C API call. [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] This function will raise an exception when an error occurs. [2020-06-14T15:23:01.268Z] Wrap every API call with this function. [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] Parameters [2020-06-14T15:23:01.268Z] ---------- [2020-06-14T15:23:01.268Z] ret : int [2020-06-14T15:23:01.268Z] return value from API calls. [2020-06-14T15:23:01.268Z] """ [2020-06-14T15:23:01.268Z] if ret != 0: [2020-06-14T15:23:01.268Z] > raise get_last_ffi_error() [2020-06-14T15:23:01.268Z] E mxnet.base.MXNetError: Traceback (most recent call last): [2020-06-14T15:23:01.268Z] E [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXSetProcessProfilerConfig+0x1bb) [0x7f937ce083eb] [2020-06-14T15:23:01.268Z] E [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetConfig(int, std::string, bool, float, bool)+0x85) [0x7f93824c4ba5] [2020-06-14T15:23:01.268Z] E [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetContinuousProfileDump(bool, float)+0x8b8) [0x7f93824c4428] [2020-06-14T15:23:01.268Z] E [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::ThreadGroup::Thread::joinable() const+0xbf) [0x7f93824c637f] [2020-06-14T15:23:01.268Z] E [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6d) [0x7f937cc7df3d] [2020-06-14T15:23:01.268Z] E File "../include/dmlc/thread_group.h", line 226 [2020-06-14T15:23:01.268Z] E MXNetError: Check failed: auto_remove_ == false (1 vs. 0) : [2020-06-14T15:23:01.268Z] [2020-06-14T15:23:01.268Z] python/mxnet/base.py:246: MXNetError [2020-06-14T15:23:01.268Z] ---------------------------- Captured stderr setup ----------------------------- [2020-06-14T15:23:01.268Z] DEBUG:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce. [2020-06-14T15:23:01.268Z] ------------------------------ Captured log setup ------------------------------ [2020-06-14T15:23:01.268Z] DEBUG root:conftest.py:193 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce. [2020-06-14T15:23:01.268Z] --------------------------- Captured stderr teardown --------------------------- [2020-06-14T15:23:01.268Z] INFO:root:np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce. [2020-06-14T15:23:01.268Z] ---------------------------- Captured log teardown ----------------------------- [2020-06-14T15:23:01.268Z] INFO root:conftest.py:210 np/mx/python random seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org