roywei commented on issue #16532: fix dropout gpu seed URL: https://github.com/apache/incubator-mxnet/pull/16532#issuecomment-544388824 I am able to reproduce CI failure locally now by running the following on P3.8x with DLAMI `ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_gpu_cu101 --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh unittest_ubuntu_python2_gpu` result: ``` ====================================================================== FAIL: test_operator_gpu.test_dropout_with_seed ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in test_new orig_test(*args, **kwargs) File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6946, in test_dropout_with_seed assert_almost_equal(b.asnumpy(), c.asnumpy()) File "/work/mxnet/python/mxnet/test_utils.py", line 624, in assert_almost_equal raise AssertionError(msg) AssertionError: Items are not equal: Error 100000002004087734272.000000 exceeds tolerance rtol=1.000000e-05, atol=1.000000e-20 (mismatch at least 0.110000%). Location of maximum error: (0, 1), a=2.00000000, b=0.00000000 ACTUAL: array([[0., 2., 2., ..., 2., 0., 0.], [0., 2., 2., ..., 0., 0., 2.], [2., 2., 2., ..., 0., 0., 2.],... DESIRED: array([[2., 0., 2., ..., 2., 0., 2.], [2., 2., 2., ..., 2., 2., 2.], [2., 0., 0., ..., 2., 2., 2.],... -------------------- >> begin captured stdout << --------------------- *** Maximum errors for vector of size 10000: rtol=1e-05, atol=1e-20 --------------------- >> end captured stdout << ---------------------- -------------------- >> begin captured logging << -------------------- common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce. --------------------- >> end captured logging << --------------------- ``` **However, running the test standalone with the same seed failed with CI enviroment passed:** ``` MXNET_TEST_SEED=179619306 nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_dropout_with_seed [INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=980748466 to reproduce. [WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically *** test_operator_gpu.test_dropout_with_seed ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce. [07:36:44] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600). Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning. ok ---------------------------------------------------------------------- Ran 1 test in 13.896s OK ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services