This issue is hugely important. At Lucidworks we have implemented a "Test Confidence" role that focuses on improving the ability of all members of the community to trust that reported failures from any of the Jenkins systems are actual failures and not flakey tests. This role rotates among the committers on our Solr Team, and a committer is assigned to the role for 2-week periods of time. Our goal is to have at least one committer on our team focused full-time on improving test confidence at all times. (Just a note on timing, we started this last summer, but we only recently reconfirmed our commitment to having someone assigned to it at all times.)
One of the guidelines we've agreed to is that the person in the role should not look (only) at tests he has worked on. Instead, he should focus on tests that fail less than 100% of the time and/or are hard to reproduce *even if he didn't write the test or the code*. Another aspect of the Test Confidence role is to try to develop tools that can help the community overall in improving this situation. Two things have grown out of this effort so far: * Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106) * Hoss has worked on aggregating all test failures from the 3 Jenkins systems (ASF, Policeman, and Steve's), downloading the test results & logs, and running some reports/stats on failures. He should be ready to share this more publicly soon. I think it's important to understand that flakey tests will *never* go away. There will always be a new flakey test to review/fix. Our goal should be to make it so most of the time, you can assume the test is broken and only discover it's flakey as part of digging. The idea of @BadApple marking (or some other notation) is an OK idea, but the problem is so bad today I worry it does nothing to find a way to ensure they get fixed. Lots of JIRAs get filed for problems with tests - I count about 180 open issues today - and many just sit there forever. The biggest thing I want to to avoid is making it even easier to avoid/ignore them. We should try to make it easier to highlight them, and we need a concerted effort to fix the tests once they've been identified as flakey. On Wed, Feb 21, 2018 at 5:03 PM, Uwe Schindler <[email protected]> wrote: > Hi, > > > Flakey Test Problems: > > a) Flakey tests create so much noise that people no longer pay > > attention to the automated reporting via email. > > b) When running unit tests manually before a commit (i.e. "ant test") > > a flakey test can fail. > > > > Solutions: > > We cloud fix (a) by marking as flakey and having a new target > > "non-flakey" that is run by the jenkins jobs that are currently run > > continuously. > > We have a solution for this already: Mark all those tests with @AwaitsFix > or @BadApple > By default those aren't executed in Jenkins runs and also not for > developers, but devs can enable/disable them using -Dtests.awaitsfix=true > and -Dtests.badapples=true: > > [help] # Test groups. ------------------------------ > ---------------------- > [help] # > [help] # test groups can be enabled or disabled (true/false). Default > [help] # value provided below in [brackets]. > [help] > [help] ant -Dtests.nightly=[false] - nightly test group (@Nightly) > [help] ant -Dtests.weekly=[false] - weekly tests (@Weekly) > [help] ant -Dtests.awaitsfix=[false] - known issue (@AwaitsFix) > [help] ant -Dtests.slow=[true] - slow tests (@Slow) > > We can of course also make a weekly jenkins jobs that enables those tests > on Jenkins only weekly (like nightly stuff). We have "tests.badapples" and > "tests.awaitsfix" - I don't know what's the difference between both. > > So we have 2 options to classify tests, let's choose one and apply it to > all Flakey tests! > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
