Thanks for sharing that overview, Anna!

When I chased after failures of my integrations caused by flaky tests during 
summer, I’ve seen a couple of patterns I think are worth keeping in mind when 
investigating flaky tests, or when writing new tests:

* QTest::qWaitForWindowActive - very often, a test doesn’t need an active 
window at all, but just an exposed window. Use QTest::qWaitForWindowExposed 
instead

* stress tests for data races: if your test doesn’t expose any race conditions 
if you run with QThread::idealThreadCount threads, then it’s unlikely that it 
will expose races if you run with more threads. But with time sharing, the 
threads might run a lot longer than you expect. See e.g. 
https://codereview.qt-project.org/c/qt/qtbase/+/421391

* hardcoded waiting times is an anti-pattern. I know it’s not always possible 
to avoid (we don’t have qWaitFor… helpers for everything), but when testing 
high-level functionality that relies on lower-level functionality, then it’s a 
good idea to check that the lower-level bits worked. E.g. 
https://codereview.qt-project.org/c/qt/qtbase/+/421658


To the last point - tests can use our private APIs, so adding private 
infrastructure that makes it easier to write robust tests is a good idea!


Cheers,
Volker



> On 30 Aug 2022, at 18:53, Anna Wojciechowska <[email protected]> wrote:
> 
> Hello,
> 
> I would like to present the results of the July fixing of flaky tests.
> 
> short version:
> 19 - number of platform-specific flakiness to start with: (14 different 
> tests) 
> 
> as a result:
> 11 platform-specific flakiness was fixed (caused by 8 different tests) 
> 4 platform specific flakiness was still flaky (from 4 different tests)
> 4 cases of flakiness were blacklisted (2 different tests)
> 
> The table under the link below shows more detailed information about fixed 
> tests.
> https://wiki.qt.io/Fixed_flaky_tests_in_July_2022
> 
> long version:
> How was the problem approached? 
> We collected data about flakiness from June, in July we created a list of top 
> "worst" cases that failed integrations and we contacted module maintainers. 
> We gave some time for changes to be merged and run a sufficient number of 
> times to gain confidence that the fix actually worked - and in late August we 
> checked the results again.
> 
> The complete lists of flaky tests from June that were being fixed in July can 
> be found at this link:
> https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&from=1656626400000&to=1659304799000&viewPanel=65
> 
> Which tests were taken into analysis? 
> The tests from dev branch that impacted negatively the integration system by 
> causing at least 1 failure in any integrations and at least 1 flaky event.
> 
> What is the difference between a failed and a flaky test?
> You can find a good explanation here:
> https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&viewPanel=55
> and here:
> https://testresults.qt.io/grafana/d/000000007/flaky-summary-ci-test-info?orgId=1&viewPanel=41
> 
> What is understood by a "test"?
> A test is an umbrella term for a pair: test case and test function. A test 
> case (usually a cpp file) contains several test functions that return results 
> (pass, fail, or xfail). We collected and analyzed the results. Additionally, 
> some tests contain data tags - test function arguments that also provide more 
> detailed results, however, we do not store them, the granularity of the data 
> ends at the test function level.
> 
> What is understood by a "platform-specific flakiness"?
> A test runs on a specific platform - we describe it as "target operating 
> system" and "target architecture". In most cases, flakiness is related to a 
> particular test run on a specific platform. 
> E.g., test case: "tst_qmutex" , test function "more stress" can return stable 
> results most platforms but be flaky on: MacOS_11 X86_64 or on Windows_10_21H2 
> X86_64 . In such case, it will be counted as 2 "platform specific flakiness" 
> (MacOS_11 and Windows_10_21H2_) caused by a single (unique, distinct) test.
> 
> Since July fixing provided good results, in August we repeated the procedure: 
> we gathered data about the most damaging (failing integrations) flaky tests 
> and we compared it to July, to make sure only "new" tests are on the list. 
> August's failing flakiness can be viewed under the link below.  Developers 
> and maintainers are welcome to check if their tests are on the list.
> https://wiki.qt.io/Flaky_tests_that_caused_failures_in_August
> 
> Big thanks to everyone participating in fixing the tests!
> 
> Anna Wojciechowska
> 
> The notebooks used to prepare this analysis can be found at:
> https://git.qt.io/qtqa/notebooks/-/tree/main/flakiness/august_2022
> 
> _______________________________________________
> Development mailing list
> [email protected]
> https://lists.qt-project.org/listinfo/development

_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

Reply via email to