DickJC123 opened a new pull request #9791: CI test randomness 3
URL: https://github.com/apache/incubator-mxnet/pull/9791
 
 
   ## Description ##
   This is a rebasing and partial squashing of the stale "ci test randomness2" 
PR #8526 .  It's based on the premise that the CI should test the framework the 
way the users do- with lots of different data.  Having tests run with random 
data has been problematic though, with flaky tests being a constant annoyance.  
This PR supplies the tools to track down the flakiness by explicitly seeding 
all tests that use random data, and reporting those seeds if the test fails.  
The first commit of the PR rolls out the with_seed() decorator to tests under 
the tests/python/unittests directory with some hard-coded seeds designed to 
expose existing flakiness.  Follow-up commits will supply known fixes.  Further 
commits will touch the tests/python/gpu directory tests.
   
   From PR #8526:
   
   This PR introduces a new simple level of control over unittest random seeds 
while providing inter-test random number generator (RNG) isolation. This PR is 
an improved replacement to the pending PR #8313. The improvements over that PR 
are:
   
       A unittest that fails via an exception will have its seed reported in 
the test log. Reproducing the failure with the same-seeded data is simple and 
immediate.
       The mx.random and python random seeds are also set (identically to 
np.random) giving deterministic behavior and test isolation for mxnet cpu and 
gpu RNG's.
       A unittest failure via a core-dump can also be reproduced after the 
module test is re-run with debugging enabled.
   
   To provide this functionality, a custom decorator "@with_seed()" was 
created. This was considered more powerful than the nosetests "@with_setup()" 
facility, and less disruptive than changing all tests to become methods of a 
nosetests test class. The proposed new approach is demonstrated on a simple 
"test_module.py" test file of three tests. Assuming that the second test needs 
a set seed for robustness, the file might currently appear as:
   ```
   def test_op1():
       <op1 test>
   
   def test_op2():
       np.random.seed(1234)
       <op2 test>
   
   def test_op3():
       <op3 test>
   ```
   Even though test_op3() is OK with nondeterministic data, it will have only a 
single dataset because it is run after test_op2, which sets the seed. Also, if 
test_op1() were to fail, there would be no way to reproduce the failure, except 
for running the test individually to produce a new and hopefully similar 
failure.
   
   With the proposed approach, the test file becomes:
   ```
   from common import setup_module, with_seed
   
   @with_seed()
   def test_op1():
       <op1 test>
   
   @with_seed(1234)
   def test_op2():
       <op2 test>
   
   @with_seed()
   def test_op3():
       <op3 test>
   ```
   By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are 
set initially to a random "module seed" that is output to the log file. The 
initial RNGs can be thought of as module-level RNGs that are isolated from the 
individual tests and that provide the string of "test seeds" that determine the 
behavior of each test's RNGs. The "@with_seed()" test function decorator 
requests a test seed from the module RNG, sets the numpy, python and mxnet 
seeds appropriately and runs the test. Should the test fail, the seed for that 
test is output to the log file. Pass or fail, the decorator reinstates the 
module RNG state before the next test's decorator is executed, effectively 
isolating the tests. Debugging a failing test_op3 in the example would proceed 
as follows:
   ```
   $ nosetests --verbose -s test_module.py
   [INFO] Setting module np/mx random seeds, use MXNET_MODULE_SEED=3444154063 
to reproduce.
   test_module.test_op1 ... ok
   test_module.test_op2 ... [INFO] Setting test np/mx/python random seeds, use 
MXNET_TEST_SEED=1234 to reproduce.
   ok
   test_module.test_op3 ... [INFO] Setting test np/mx/python random seeds, use 
MXNET_TEST_SEED=2096230603 to reproduce.
   FAIL
   ======================================================================
   FAIL: test_module.test_op3
   ----------------------------------------------------------------------
   Traceback (most recent call last):
   <stack trace appears here>
   -------------------- >> begin captured logging << --------------------
   common: INFO: Setting test np/mx/python random seeds, use 
MXNET_TEST_SEED=2096230603 to reproduce.
   --------------------- >> end captured logging << ---------------------
   ----------------------------------------------------------------------
   Ran 3 tests in 1.354s
   FAILED (failures=1)
   ```
   Because test_op3 failed, its seed appeared in the log file. Also, the 
test_op2 seed was displayed as a reminder that that test needs more work before 
it is robust enough for random data. The command to reproduce the problem is 
produced simply by cutting and pasting from the log:
   ```
   $ MXNET_TEST_SEED=2096230603 nostests --verbose -s test_module.py:test_op3
   ```
   If test_op3 instead dumped core, the test seed would not be initially 
apparent. Assuming the core-dump is repeatable based on the data, the module 
would first be re-run with the command:
   ```
   $ MXNET_MODULE_SEED=3444154063 nostests --logging-level=DEBUG --verbose -s 
test_module.py
   ```
   The log would now include the test seeds for all tests before they are run, 
so the test could then be run in isolation as before with 
MXNET_TEST_SEED=2096230603.
   
   Let's assume that test_op3 was altered by increasing a tolerance. How robust 
is the test now? This can be explored by repeating the test many times as in:
   ```
   $ MXNET_TEST_COUNT=10000 nostests --logging-level=DEBUG --verbose -s 
test_module.py:test_op3
   ```
   Finally, this PR adds the @with_seed() decorator for all tests in modules 
that use random numbers. Also, it includes many specific test robustness fixes 
that were exposed once this new methodology was adopted internally by the 
author.
   
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - [ ] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to