mseth10 opened a new issue #18564:
URL: https://github.com/apache/incubator-mxnet/issues/18564


   ## Description
   `test_gpu_memory_profiler_gluon` fails intermittently for different cu* 
flavors in nightly CD pipelines.
   
   ## Occurrences
   1. 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1257/pipeline
   2. 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1245/pipeline
   
   ## Error log
   ```
   [2020-06-14T15:23:01.268Z] ________________________ 
test_gpu_memory_profiler_gluon ________________________
   [2020-06-14T15:23:01.268Z] [gw1] linux -- Python 3.6.9 
/opt/rh/rh-python36/root/usr/bin/python3
   [2020-06-14T15:23:01.268Z] 
   [2020-06-14T15:23:01.268Z]     @pytest.mark.skipif(mx.context.num_gpus() == 
0, reason="GPU memory profiler records allocation on GPUs only")
   [2020-06-14T15:23:01.268Z]     def test_gpu_memory_profiler_gluon():
   [2020-06-14T15:23:01.268Z]         
enable_profiler(profile_filename='test_profiler.json',
   [2020-06-14T15:23:01.268Z] >                       run=True, 
continuous_dump=True)
   [2020-06-14T15:23:01.268Z] 
   [2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:537: 
   [2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   [2020-06-14T15:23:01.268Z] tests/python/unittest/test_profiler.py:40: in 
enable_profiler
   [2020-06-14T15:23:01.268Z]     aggregate_stats=aggregate_stats)
   [2020-06-14T15:23:01.268Z] python/mxnet/profiler.py:69: in set_config
   [2020-06-14T15:23:01.268Z]     profiler_kvstore_handle))
   [2020-06-14T15:23:01.268Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
   [2020-06-14T15:23:01.268Z] 
   [2020-06-14T15:23:01.268Z] ret = -1
   [2020-06-14T15:23:01.268Z] 
   [2020-06-14T15:23:01.268Z]     def check_call(ret):
   [2020-06-14T15:23:01.268Z]         """Check the return value of C API call.
   [2020-06-14T15:23:01.268Z]     
   [2020-06-14T15:23:01.268Z]         This function will raise an exception 
when an error occurs.
   [2020-06-14T15:23:01.268Z]         Wrap every API call with this function.
   [2020-06-14T15:23:01.268Z]     
   [2020-06-14T15:23:01.268Z]         Parameters
   [2020-06-14T15:23:01.268Z]         ----------
   [2020-06-14T15:23:01.268Z]         ret : int
   [2020-06-14T15:23:01.268Z]             return value from API calls.
   [2020-06-14T15:23:01.268Z]         """
   [2020-06-14T15:23:01.268Z]         if ret != 0:
   [2020-06-14T15:23:01.268Z] >           raise get_last_ffi_error()
   [2020-06-14T15:23:01.268Z] E           mxnet.base.MXNetError: Traceback 
(most recent call last):
   [2020-06-14T15:23:01.268Z] E             [bt] (4) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(MXSetProcessProfilerConfig+0x1bb)
 [0x7f937ce083eb]
   [2020-06-14T15:23:01.268Z] E             [bt] (3) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetConfig(int,
 std::string, bool, float, bool)+0x85) [0x7f93824c4ba5]
   [2020-06-14T15:23:01.268Z] E             [bt] (2) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetContinuousProfileDump(bool,
 float)+0x8b8) [0x7f93824c4428]
   [2020-06-14T15:23:01.268Z] E             [bt] (1) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::ThreadGroup::Thread::joinable()
 const+0xbf) [0x7f93824c637f]
   [2020-06-14T15:23:01.268Z] E             [bt] (0) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6d)
 [0x7f937cc7df3d]
   [2020-06-14T15:23:01.268Z] E             File 
"../include/dmlc/thread_group.h", line 226
   [2020-06-14T15:23:01.268Z] E           MXNetError: Check failed: 
auto_remove_ == false (1 vs. 0) :
   [2020-06-14T15:23:01.268Z] 
   [2020-06-14T15:23:01.268Z] python/mxnet/base.py:246: MXNetError
   [2020-06-14T15:23:01.268Z] ---------------------------- Captured stderr 
setup -----------------------------
   [2020-06-14T15:23:01.268Z] DEBUG:root:np/mx/python random seeds are set to 
794738585, use MXNET_TEST_SEED=794738585 to reproduce.
   [2020-06-14T15:23:01.268Z] ------------------------------ Captured log setup 
------------------------------
   [2020-06-14T15:23:01.268Z] DEBUG    root:conftest.py:193 np/mx/python random 
seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
   [2020-06-14T15:23:01.268Z] --------------------------- Captured stderr 
teardown ---------------------------
   [2020-06-14T15:23:01.268Z] INFO:root:np/mx/python random seeds are set to 
794738585, use MXNET_TEST_SEED=794738585 to reproduce.
   [2020-06-14T15:23:01.268Z] ---------------------------- Captured log 
teardown -----------------------------
   [2020-06-14T15:23:01.268Z] INFO     root:conftest.py:210 np/mx/python random 
seeds are set to 794738585, use MXNET_TEST_SEED=794738585 to reproduce.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to