Absolutely 1 Great idea! Happy to coordinate - and I hope others would like to join it as well :)
On Mon, Apr 20, 2020 at 12:04 PM Tomasz Urbaszek < [email protected]> wrote: > Got it! > > What would you say to organize a more coordinated effort to improve > our testing suite something like "Fridays with tests"? In a few weeks, > this should result in a much better test suite and probably fewer > problems with CI. This also a nice way to take a look at Airflow > internals :) > > Tomek > > > On Mon, Apr 20, 2020 at 10:18 AM Jarek Potiuk <[email protected]> > wrote: > > > > Both - depending on the tests. I think for now I've been over-cautious a > > bit and after merging while observing a few runs in production (and other > > people's PR we might quickly go down with the number of quarantined > tests. > > > > I think most of the problematic tests are really "long-running" and > pretty > > stand-alone ones. I think part of the process should be that if we find > > that they require some side effects, we will be able to fix that the and > > eventually we will only have few quarantined "single tests" rather than > > "whole classes" > > > > On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek < > [email protected]> > > wrote: > > > > > Thank you Jarek for your work! > > > +1 for the idea of quarantine tests. Just one question: are we marking > > > single tests or whole classes? This question is mostly related to > > > tests that requires some side effects from previous tests. > > > > > > Tomek > > > > > > > > > On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <[email protected] > > > > > wrote: > > > > > > > > Hello everyone, > > > > > > > > I have a proposal - very much COVID-19-inspired on how to fix our CI > > > tests... > > > > > > > > After the recent problems with CI together with Daniel and Tomek we > > > > decided to make an emergency migration to Github Actions. So we did. > > > > > > > > I think overall it was a good move, but we had some problems with it. > > > > It turns out that while we were blaming Travis for everything wrong > > > > that happened in our builds, it was not always Travis' fault. We have > > > > some tests that are also failing in Github Actions and I think it's > > > > the highest time we fix them. > > > > > > > > I spend a better part of the weekend bring trying different things > and > > > > implementing numerous optimizations back to our CI configuration (a > > > > lot of those were lost during the emergency move). > > > > > > > > While running it I had many issues and I think I found a good way to > > > > handle our flaky tests. I would love that others think about it. > > > > > > > > Those interested - please take a look at the PR "Bring back CI > > > > optimisations" https://github.com/apache/airflow/pull/8393 > > > > Corresponding GituhbActions here: > > > > https://github.com/apache/airflow/actions/runs/82410109 > > > > > > > > I implemented a lot of optimizations in this PR (some of them will > > > > only take effect after we merge to master) but most of all I wanted > to > > > > introduce a concept of "quarantined tests" (good name isn't it :) ) > > > > > > > > Here is the idea: > > > > > > > > - tests that are marked as @pytest.mark.quarantined are skipped in > > > > regular runs (I identified 58 potential candidates - not all of them > > > > are flaky but I wanted to be safe) > > > > - there is one dedicated "Quarantine" job that runs only quarantined > > > > tests (it's Postgres 9.6 with Python 3.6 for now) > > > > - those "quarantined" tests are run with 90 s. timeout each and > rerun > > > > up to 3 times if they fail > > > > - failure of any of the Quarantine tests does not fail the whole CI > > > > - I plan to create GithUb issues for groups of those tests > > > > (MoveOutOfQuarantine NNNN) > > > > - I think it's best if we split them between committers > > > > - The job of the committers will be to observe the stability of those > > > tests > > > > - once we fix and observe that the tests are "stable" we move them > > > > out of Quarantine back to regular tests (by removing > > > > @pytest.mark.quarantined) > > > > - the goal is to move all our tests out of Quarantine > > > > - in the future we can move any flaky test to Quarantine (by adding > > > > @pytest.mark.quarantined) and it will give us time to observe it and > > > > fix any flakiness. > > > > > > > > Let me know what you think of it? > > > > > > > > J. > > > > > > > > -- > > > > Jarek Potiuk > > > > Polidea | Principal Software Engineer > > > > > > > > M: +48 660 796 129 > > > > > > > > > > > > -- > > > > > > Tomasz Urbaszek > > > Polidea | Software Engineer > > > > > > M: +48 505 628 493 > > > E: [email protected] > > > > > > Unique Tech > > > Check out our projects! > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > > -- > > Tomasz Urbaszek > Polidea | Software Engineer > > M: +48 505 628 493 > E: [email protected] > > Unique Tech > Check out our projects! > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
