Thanks for sharing that overview, Anna!
When I chased after failures of my integrations caused by flaky tests during summer, I’ve seen a couple of patterns I think are worth keeping in mind when investigating flaky tests, or when writing new tests: * QTest::qWaitForWindowActive - very often, a test doesn’t need an active window at all, but just an exposed window. Use QTest::qWaitForWindowExposed instead * stress tests for data races: if your test doesn’t expose any race conditions if you run with QThread::idealThreadCount threads, then it’s unlikely that it will expose races if you run with more threads. But with time sharing, the threads might run a lot longer than you expect. See e.g. https://codereview.qt-project.org/c/qt/qtbase/+/421391 * hardcoded waiting times is an anti-pattern. I know it’s not always possible to avoid (we don’t have qWaitFor… helpers for everything), but when testing high-level functionality that relies on lower-level functionality, then it’s a good idea to check that the lower-level bits worked. E.g. https://codereview.qt-project.org/c/qt/qtbase/+/421658 To the last point - tests can use our private APIs, so adding private infrastructure that makes it easier to write robust tests is a good idea! Cheers, Volker > On 30 Aug 2022, at 18:53, Anna Wojciechowska <[email protected]> wrote: > > Hello, > > I would like to present the results of the July fixing of flaky tests. > > short version: > 19 - number of platform-specific flakiness to start with: (14 different > tests) > > as a result: > 11 platform-specific flakiness was fixed (caused by 8 different tests) > 4 platform specific flakiness was still flaky (from 4 different tests) > 4 cases of flakiness were blacklisted (2 different tests) > > The table under the link below shows more detailed information about fixed > tests. > https://wiki.qt.io/Fixed_flaky_tests_in_July_2022 > > long version: > How was the problem approached? > We collected data about flakiness from June, in July we created a list of top > "worst" cases that failed integrations and we contacted module maintainers. > We gave some time for changes to be merged and run a sufficient number of > times to gain confidence that the fix actually worked - and in late August we > checked the results again. > > The complete lists of flaky tests from June that were being fixed in July can > be found at this link: > https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&from=1656626400000&to=1659304799000&viewPanel=65 > > Which tests were taken into analysis? > The tests from dev branch that impacted negatively the integration system by > causing at least 1 failure in any integrations and at least 1 flaky event. > > What is the difference between a failed and a flaky test? > You can find a good explanation here: > https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&viewPanel=55 > and here: > https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&viewPanel=41 > > What is understood by a "test"? > A test is an umbrella term for a pair: test case and test function. A test > case (usually a cpp file) contains several test functions that return results > (pass, fail, or xfail). We collected and analyzed the results. Additionally, > some tests contain data tags - test function arguments that also provide more > detailed results, however, we do not store them, the granularity of the data > ends at the test function level. > > What is understood by a "platform-specific flakiness"? > A test runs on a specific platform - we describe it as "target operating > system" and "target architecture". In most cases, flakiness is related to a > particular test run on a specific platform. > E.g., test case: "tst_qmutex" , test function "more stress" can return stable > results most platforms but be flaky on: MacOS_11 X86_64 or on Windows_10_21H2 > X86_64 . In such case, it will be counted as 2 "platform specific flakiness" > (MacOS_11 and Windows_10_21H2_) caused by a single (unique, distinct) test. > > Since July fixing provided good results, in August we repeated the procedure: > we gathered data about the most damaging (failing integrations) flaky tests > and we compared it to July, to make sure only "new" tests are on the list. > August's failing flakiness can be viewed under the link below. Developers > and maintainers are welcome to check if their tests are on the list. > https://wiki.qt.io/Flaky_tests_that_caused_failures_in_August > > Big thanks to everyone participating in fixing the tests! > > Anna Wojciechowska > > The notebooks used to prepare this analysis can be found at: > https://git.qt.io/qtqa/notebooks/-/tree/main/flakiness/august_2022 > > _______________________________________________ > Development mailing list > [email protected] > https://lists.qt-project.org/listinfo/development _______________________________________________ Development mailing list [email protected] https://lists.qt-project.org/listinfo/development
