Hello, we currently have a false-failure rate of about 59% in our CI. This leads to a lot of PRs failing due to flaky tests. We're currently in the process to fix problematic tests, but we're still not at a point which we can consider as stable.
A few colleagues proposed to disable tests temporarily to increase the stability of the system. I know that this is heavily opposed by a lot of people including myself, but we're pretty much stalling our development due to these issues. Thus, I'd like to disable them for now and make them a release requirement. We currently have 12 disabled tests (ranging back to October 2017) and I'd like to add a few more. To keep track of disabled tests, I have added a new label [1]. If a test gets fixed, the issue will be closed and thus automatically disappear from the list. To address the concerns of the community about tests being disabled and then forgotten, I'd like to make it a release requirement for 1.3 that no tests are disabled. This means we will have reduced coverage temporarily, but it will not impact our customers since they will be re-enabled for the next release - this is basically a two-way door. On the other hand, we're improving the turnaround time for PRs and reduce the frustration level. By the way, we currently have an internal sprint running at Amazon which leads to everybody focussing on addressing flaky tests. This means that this state will not persist for long. Best regards, Marco [1]: https://github.com/apache/incubator-mxnet/issues?q=is%3Aopen+is%3Aissue+label%3A%22Disabled+test%22
