That's pretty awesome. And what nice documentation! Clicking through https://github.com/apache/airflow/issues/10118 and https://github.com/apache/airflow/pull/10768 it looks like the actual quarantining / unquarantining is manual, yes? So we could reach this level with JUnit categories for Java anyhow. We would just want a good way to get test-level history to review, which I think the Jenkins Build History plugin now gives us. It would be great to have automation to let us know when a test becomes stable or flaky.
Kenn On Tue, Mar 16, 2021 at 5:29 PM Tyson Hamilton <[email protected]> wrote: > The Apache Airflow project has some interesting automation around flaky > tests. They annotate such flaky tests as 'quarantined', those quarantined > tests still run (maybe even with retries?) but won't fail a test suite. > Quarantined tests are run in a separate scheduled job, when they start > passing, they are no longer quarantined. Github issues are updated with the > status. > > [1]: > https://github.com/apache/airflow/blob/master/CI.rst#scheduled-quarantined-builds > > On Tue, Mar 16, 2021 at 4:06 PM Kenneth Knowles <[email protected]> wrote: > >> I expect the suite to be permared, right? Because of some thing or >> another flaking at all times. >> >> Kenn >> >> On Tue, Mar 16, 2021 at 2:13 PM Alex Amato <[email protected]> wrote: >> >>> Is it possible to make the presubmit auto retry all failed tests a few >>> times? (and maybe generate a report of a list of flakey tests). >>> Then you don't need to disable/isolate the flakey tests. >>> >>> If this is not possible, or hard to setup, then manually moving them to >>> a different suite sounds like a good idea. >>> >>> On Tue, Mar 16, 2021 at 2:11 PM Pablo Estrada <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> In Beam, we sometimes hit the issue of having one or two test cases >>>> that are particularly flaky, and we deactivate them. >>>> This is completely reasonable to me, because we need to keep good >>>> testing signal on our primary suites. >>>> The danger of deactivating these tests is that, although we have good >>>> practices to file JIRA issues to re-enable them, it is still easy for these >>>> issues and tests to be forgotten. >>>> Of course, ideally, the solution is "do not forget old deactivated >>>> tests" - and we should adopt practices to ensure that. >>>> >>>> I think, to strengthen our practices, we can reinforce them with a >>>> pragmatic choice: Instead of fully deactivating tests, we can make them run >>>> in a separate suite of Flaky tests. Why would this help? >>>> >>>> - It would allow us to make sure that flaky tests continue to *be able >>>> to run*. >>>> - It would remind us that we have flaky tests that need fixing. >>>> - It would allow us to experiment fixes to these tests on the Flaky >>>> suite, and once they're reliable, move them to the main suite. >>>> >>>> Does this make sense to others? >>>> Best >>>> -P. >>>> >>>
