jinhuang415 commented on issue #10433: [MXNET-290] MKLDNN support for model 
quantization
URL: https://github.com/apache/incubator-mxnet/pull/10433#issuecomment-393196924
 
 
   @reminisce @zheng-da  We have resolved all the comments, would you help to 
check if you have further comments on the change?
   @marcoabreu @zheng-da We see a lot of Jenkins failure recently after 
submitting new change and most of the failure happens at CPP:GPU Unittest (see 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10433/51/pipeline/726/),
 we tried on our local GPU and everything is fine, and we tried to re-trigger 
the Jenkins and it can pass sometimes (not stable, sometimes need to re-trigger 
several times to pass Jenkins), I think our change should not impact CPP:GPU 
testing, would you help to check if this is an known issue for Jenkins system 
or MXNet base code? Or is there any way to debug the failure issue on Jenkins? 
Thanks. I copied the failure log as below for your reference:
   
   ```
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: NaiveEngine
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:135: 
Stopped: NaiveEngine Starting...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:137: 
Started: NaiveEngine Done...
   
   [14:18:36] /work/mxnet/tests/cpp/engine/threaded_engine_test.cc:133: 
Stopping: ThreadedEnginePooled
   
   terminate called after throwing an instance of 'std::system_error'
   
     what():  Operation not permitted
   
   /work/runtime_functions.sh: line 476:     7 Aborted                 (core 
dumped) build/tests/mxnet_unit_tests
   
   build.py: 2018-05-30 14:18:38,174 Running of command in container failed 
(134): nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp
   
   build.py: 2018-05-30 14:18:38,175 You can try to get into the container by 
using the following command: nvidia-docker run --rm -t --shm-size=500m -v 
/home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 -ti 
--entrypoint /bin/bash mxnet/build.ubuntu_gpu /work/runtime_functions.sh 
unittest_ubuntu_gpu_cpp
   
   into container: False
   
   Traceback (most recent call last):
   
     File "ci/build.py", line 307, in <module>
   
       sys.exit(main())
   
     File "ci/build.py", line 243, in main
   
       container_run(platform, docker_binary, shared_memory_size, command)
   
     File "ci/build.py", line 154, in container_run
   
       raise subprocess.CalledProcessError(ret, cmd)
   
   subprocess.CalledProcessError: Command 'nvidia-docker run --rm -t 
--shm-size=500m -v /home/jenkins_slave/workspace/ut-cpp-gpu:/work/mxnet -v 
/home/jenkins_slave/workspace/ut-cpp-gpu/build:/work/build -u 1001:1001 
mxnet/build.ubuntu_gpu /work/runtime_functions.sh unittest_ubuntu_gpu_cpp' 
returned non-zero exit status 134
   
   script returned exit code 1
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to