Call for Help for Fixing Flaky Tests

Sheng Zha Sat, 13 Jan 2018 14:15:07 -0800

Hi MXNet community,

Thanks to the efforts of several community members, we identified many
flaky tests. These tests are currently disabled to ensure the smooth
execution of continuous integration (CI). As a result, we lost coverage on
those features. They need fixing and to be re-enabled to ensure the quality
of our releases. I'd like to propose the following:


1, Re-enable flaky python tests with retries if feasible
Although the tests are unstable, they would still be able to catch breaking
changes. For example, suppose a test fails randomly with 10% probability,
the probability of three failed retries become 0.1%. On the other hand, a
breaking change would result in 100% failure. Although this could increase
the testing time, it's a compromise that can help avoid bigger problem.

2, Set standard for new tests
I think having criteria that new tests should follow can help improve the
quality of tests, but also the quality of code. I propose the following
standard for tests.
- Reliably passing with good coverage
- Avoid randomness unless necessary
- Avoid external dependency unless necessary (e.g. due to license)
- Not resource-intensive unless necessary (e.g. scaling tests)

In addition, I'd like to call for volunteers on helping with the fix of
tests. New members are especially welcome, as it's a good opportunity to
familiarize with MXNet. Also, I'd like to request that members who wrote
the feature/test could help either by fixing, or by helping others
understand the issues.

The effort on fixing the tests is tracked at:
https://github.com/apache/incubator-mxnet/issues/9412

Best regards,
Sheng

Call for Help for Fixing Flaky Tests

Reply via email to