Thanks for sending this Benno! I for one would love to see more regular
communication about the state of CI, especially so that I know how I can
help fix tests (right now I don't know which flaky tests are in areas I am
maintaining).

Is there any reason the first portion of the test name is being truncated?
For example, ResourceStatistics matches several tests:

$ grep -R ' ResourceStatistics)' src/tests
src/tests/containerizer/xfs_quota_tests.cpp:TEST_F(ROOT_XFS_QuotaTest,
ResourceStatistics)
src/tests/slave_recovery_tests.cpp:TEST_F(MesosContainerizerSlaveRecoveryTest,
ResourceStatistics)
src/tests/disk_quota_tests.cpp:TEST_F(DiskQuotaTest, ResourceStatistics)

Did we actually fix the flaky tests or did we disable them? I see only 22
disabled tests, which is better than I expected, but I hope there's good
tracking on getting these un-disabled again:

$ grep -R DISABLED src/tests | grep -v DISABLED_ON_WINDOWS | grep -v
NestedQuota | grep -v ChildRole | grep -v NestedRoles | grep -v
environment.cpp | wc -l
      22

On Fri, Oct 12, 2018 at 7:38 AM Benno Evers <bev...@mesosphere.com> wrote:

> Hey all,
>
> as you might know, we've set up an internal CI system that is running `make
> check` on a variety of different platforms and configurations, 16 in total.
>
> As we've experienced more and more pain maintaining a green master, I've
> compiled some statistics about which tests are most flaky. I thought other
> people might also be interested to have a look at that data:
>
> Last Week:
>
>     # CI Statistics since 2018-10-05 14:22:35.422882 for branches
> containing 'asf/master'
>     Total: 41 failing tests, 28 unique. (avg 0.142361111111 failing tests
> per build)
>
>     Top 5 failing tests:
>     6x: [empty]
>     4x: ResourceStatistics
>     2x: CreateDestroyDiskRecovery
>     2x: INTERNET_CURL_InvokeFetchByName
>     2x: RecoverNestedContainer
>
> Last Month:
>
>     # CI Statistics since 2018-09-12 14:23:36.272031 for branches
> containing 'asf/master'
>     Total: 320 failing tests, 75 unique. (avg 0.285714285714 failing tests
> per build)
>
>     Top 5 failing tests:
>     57x: Used
>     32x: LongLivedDefaultExecutorRestart
>     27x: PythonFramework
>     23x: ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel
>     22x: ResourceStatistics
>
> Last year:
>
>     # CI Statistics since 2017-10-12 14:24:31.639792 for branches
> containing 'asf/master'
>     Total: 3045 failing tests, 225 unique. (avg 0.184054642166 failing
> tests per build)
>
>     Top 5 failing tests:
>     292x: [empty]
>     272x: ROOT_LOGROTATE_UNPRIVILEGED_USER_RotateWithSwitchUserTrueOrFalse
>     136x: LOGROTATE_RotateInSandbox
>     136x: LOGROTATE_CustomRotateOptions
>     131x: ResourceStatistics
>
>
> I don't really have a point with all of this, but some observations:
>  - [empty] means that the `mesos-tests` binary crashed
>  - The data also includes "real", i.e. non-flaky test failures, but they
> should not appear in the top 5 lists because we would hopefully either
> revert or fix them before they can accumulate dozens of failures
>  - Over the whole year, we seem to be pretty good at fixing  the nastiest
> flakes, with only one of the top 5 still appearing in this weeks test
> results
>  - Sadly, the fail percentage isn't as different between now and then as we
> might have hoped.
>
> Hope this was interesting, and best regards,
> --
> Benno Evers
> Software Engineer, Mesosphere
>

Reply via email to