DickJC123 opened a new pull request #9791: CI test randomness 3 URL: https://github.com/apache/incubator-mxnet/pull/9791 ## Description ## This is a rebasing and partial squashing of the stale "ci test randomness2" PR #8526 . It's based on the premise that the CI should test the framework the way the users do- with lots of different data. Having tests run with random data has been problematic though, with flaky tests being a constant annoyance. This PR supplies the tools to track down the flakiness by explicitly seeding all tests that use random data, and reporting those seeds if the test fails. The first commit of the PR rolls out the with_seed() decorator to tests under the tests/python/unittests directory with some hard-coded seeds designed to expose existing flakiness. Follow-up commits will supply known fixes. Further commits will touch the tests/python/gpu directory tests. From PR #8526: This PR introduces a new simple level of control over unittest random seeds while providing inter-test random number generator (RNG) isolation. This PR is an improved replacement to the pending PR #8313. The improvements over that PR are: A unittest that fails via an exception will have its seed reported in the test log. Reproducing the failure with the same-seeded data is simple and immediate. The mx.random and python random seeds are also set (identically to np.random) giving deterministic behavior and test isolation for mxnet cpu and gpu RNG's. A unittest failure via a core-dump can also be reproduced after the module test is re-run with debugging enabled. To provide this functionality, a custom decorator "@with_seed()" was created. This was considered more powerful than the nosetests "@with_setup()" facility, and less disruptive than changing all tests to become methods of a nosetests test class. The proposed new approach is demonstrated on a simple "test_module.py" test file of three tests. Assuming that the second test needs a set seed for robustness, the file might currently appear as: ``` def test_op1(): <op1 test> def test_op2(): np.random.seed(1234) <op2 test> def test_op3(): <op3 test> ``` Even though test_op3() is OK with nondeterministic data, it will have only a single dataset because it is run after test_op2, which sets the seed. Also, if test_op1() were to fail, there would be no way to reproduce the failure, except for running the test individually to produce a new and hopefully similar failure. With the proposed approach, the test file becomes: ``` from common import setup_module, with_seed @with_seed() def test_op1(): <op1 test> @with_seed(1234) def test_op2(): <op2 test> @with_seed() def test_op3(): <op3 test> ``` By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are set initially to a random "module seed" that is output to the log file. The initial RNGs can be thought of as module-level RNGs that are isolated from the individual tests and that provide the string of "test seeds" that determine the behavior of each test's RNGs. The "@with_seed()" test function decorator requests a test seed from the module RNG, sets the numpy, python and mxnet seeds appropriately and runs the test. Should the test fail, the seed for that test is output to the log file. Pass or fail, the decorator reinstates the module RNG state before the next test's decorator is executed, effectively isolating the tests. Debugging a failing test_op3 in the example would proceed as follows: ``` $ nosetests --verbose -s test_module.py [INFO] Setting module np/mx random seeds, use MXNET_MODULE_SEED=3444154063 to reproduce. test_module.test_op1 ... ok test_module.test_op2 ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1234 to reproduce. ok test_module.test_op3 ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2096230603 to reproduce. FAIL ====================================================================== FAIL: test_module.test_op3 ---------------------------------------------------------------------- Traceback (most recent call last): <stack trace appears here> -------------------- >> begin captured logging << -------------------- common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=2096230603 to reproduce. --------------------- >> end captured logging << --------------------- ---------------------------------------------------------------------- Ran 3 tests in 1.354s FAILED (failures=1) ``` Because test_op3 failed, its seed appeared in the log file. Also, the test_op2 seed was displayed as a reminder that that test needs more work before it is robust enough for random data. The command to reproduce the problem is produced simply by cutting and pasting from the log: ``` $ MXNET_TEST_SEED=2096230603 nostests --verbose -s test_module.py:test_op3 ``` If test_op3 instead dumped core, the test seed would not be initially apparent. Assuming the core-dump is repeatable based on the data, the module would first be re-run with the command: ``` $ MXNET_MODULE_SEED=3444154063 nostests --logging-level=DEBUG --verbose -s test_module.py ``` The log would now include the test seeds for all tests before they are run, so the test could then be run in isolation as before with MXNET_TEST_SEED=2096230603. Let's assume that test_op3 was altered by increasing a tolerance. How robust is the test now? This can be explored by repeating the test many times as in: ``` $ MXNET_TEST_COUNT=10000 nostests --logging-level=DEBUG --verbose -s test_module.py:test_op3 ``` Finally, this PR adds the @with_seed() decorator for all tests in modules that use random numbers. Also, it includes many specific test robustness fixes that were exposed once this new methodology was adopted internally by the author. ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage: - Unit tests are added for small changes to verify correctness (e.g. adding a new operator) - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore) - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL) - [ ] Code is well-documented: - For user-facing API changes, API doc string has been updated. - For new C++ functions in header files, their functionalities and arguments are documented. - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Interesting edge cases to note here
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services