roywei commented on issue #16532: fix dropout gpu seed
URL: https://github.com/apache/incubator-mxnet/pull/16532#issuecomment-544388824
 
 
   I am able to reproduce CI failure locally now by running the following on 
P3.8x with DLAMI
   `ci/build.py --docker-registry mxnetci --nvidiadocker --platform 
ubuntu_gpu_cu101 --docker-build-retries 3 --shm-size 500m 
/work/runtime_functions.sh unittest_ubuntu_python2_gpu`
   
   result:
   ```
   ======================================================================
   FAIL: test_operator_gpu.test_dropout_with_seed
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
runTest
       self.test(*self.arg)
     File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in 
newfunc
       return func(*arg, **kw)
     File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in 
test_new
       orig_test(*args, **kwargs)
     File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 
6946, in test_dropout_with_seed
       assert_almost_equal(b.asnumpy(), c.asnumpy())
     File "/work/mxnet/python/mxnet/test_utils.py", line 624, in 
assert_almost_equal
       raise AssertionError(msg)
   AssertionError:
   Items are not equal:
   Error 100000002004087734272.000000 exceeds tolerance rtol=1.000000e-05, 
atol=1.000000e-20 (mismatch at least 0.110000%).
   Location of maximum error: (0, 1), a=2.00000000, b=0.00000000
    ACTUAL: array([[0., 2., 2., ..., 2., 0., 0.],
          [0., 2., 2., ..., 0., 0., 2.],
          [2., 2., 2., ..., 0., 0., 2.],...
    DESIRED: array([[2., 0., 2., ..., 2., 0., 2.],
          [2., 2., 2., ..., 2., 2., 2.],
          [2., 0., 0., ..., 2., 2., 2.],...
   -------------------- >> begin captured stdout << ---------------------
   
   *** Maximum errors for vector of size 10000:  rtol=1e-05, atol=1e-20
   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   common: INFO: Setting test np/mx/python random seeds, use 
MXNET_TEST_SEED=179619306 to reproduce.
   --------------------- >> end captured logging << ---------------------
   
   ```
   
   **However, running the test standalone with the same seed failed with CI 
enviroment passed:**
   ```
   MXNET_TEST_SEED=179619306 nosetests --logging-level=DEBUG --verbose -s  
tests/python/gpu/test_operator_gpu.py:test_dropout_with_seed
   [INFO] Setting module np/mx/python random seeds, use 
MXNET_MODULE_SEED=980748466 to reproduce.
   [WARNING] *** test-level seed set: all "@with_seed()" tests run 
deterministically ***
   test_operator_gpu.test_dropout_with_seed ... [INFO] Setting test 
np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce.
   [07:36:44] ../src/base.cc:84: Upgrade advisory: this mxnet has been built 
against cuDNN lib version 7401, which is older than the oldest version tested 
by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
   ok
   
   ----------------------------------------------------------------------
   Ran 1 test in 13.896s
   
   OK
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to