This is great. Thanks Benno for sharing! What did you use to do the analysis? I would love it if we can have graphs that we can run on TVs.
On Mon, Oct 15, 2018 at 5:23 AM Benno Evers <bev...@mesosphere.com> wrote: > > Is there any reason the first portion of the test name is being > truncated? > > There is, although it is slightly embarrassing: We currently only store the > detailed data including full test case name and platform > for about a week, for anything older than that the abridged version is the > best I could find. The data should still be good, though, > since we hopefully don't have two tests with the same name that are both > frequently flaky. > > In particular, the ResourceStatistics refers to the > 'MesosContainerizerSlaveRecoveryTest.ResourceStatistics' test tracked > in MESOS-5048. > > On Fri, Oct 12, 2018 at 7:03 PM Benjamin Mahler <bmah...@apache.org> > wrote: > > > Thanks for sending this Benno! I for one would love to see more regular > > communication about the state of CI, especially so that I know how I can > > help fix tests (right now I don't know which flaky tests are in areas I > am > > maintaining). > > > > Is there any reason the first portion of the test name is being > truncated? > > For example, ResourceStatistics matches several tests: > > > > $ grep -R ' ResourceStatistics)' src/tests > > src/tests/containerizer/xfs_quota_tests.cpp:TEST_F(ROOT_XFS_QuotaTest, > > ResourceStatistics) > > > > > src/tests/slave_recovery_tests.cpp:TEST_F(MesosContainerizerSlaveRecoveryTest, > > ResourceStatistics) > > src/tests/disk_quota_tests.cpp:TEST_F(DiskQuotaTest, ResourceStatistics) > > > > Did we actually fix the flaky tests or did we disable them? I see only 22 > > disabled tests, which is better than I expected, but I hope there's good > > tracking on getting these un-disabled again: > > > > $ grep -R DISABLED src/tests | grep -v DISABLED_ON_WINDOWS | grep -v > > NestedQuota | grep -v ChildRole | grep -v NestedRoles | grep -v > > environment.cpp | wc -l > > 22 > > > > On Fri, Oct 12, 2018 at 7:38 AM Benno Evers <bev...@mesosphere.com> > wrote: > > > > > Hey all, > > > > > > as you might know, we've set up an internal CI system that is running > > `make > > > check` on a variety of different platforms and configurations, 16 in > > total. > > > > > > As we've experienced more and more pain maintaining a green master, > I've > > > compiled some statistics about which tests are most flaky. I thought > > other > > > people might also be interested to have a look at that data: > > > > > > Last Week: > > > > > > # CI Statistics since 2018-10-05 14:22:35.422882 for branches > > > containing 'asf/master' > > > Total: 41 failing tests, 28 unique. (avg 0.142361111111 failing > tests > > > per build) > > > > > > Top 5 failing tests: > > > 6x: [empty] > > > 4x: ResourceStatistics > > > 2x: CreateDestroyDiskRecovery > > > 2x: INTERNET_CURL_InvokeFetchByName > > > 2x: RecoverNestedContainer > > > > > > Last Month: > > > > > > # CI Statistics since 2018-09-12 14:23:36.272031 for branches > > > containing 'asf/master' > > > Total: 320 failing tests, 75 unique. (avg 0.285714285714 failing > > tests > > > per build) > > > > > > Top 5 failing tests: > > > 57x: Used > > > 32x: LongLivedDefaultExecutorRestart > > > 27x: PythonFramework > > > 23x: ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel > > > 22x: ResourceStatistics > > > > > > Last year: > > > > > > # CI Statistics since 2017-10-12 14:24:31.639792 for branches > > > containing 'asf/master' > > > Total: 3045 failing tests, 225 unique. (avg 0.184054642166 failing > > > tests per build) > > > > > > Top 5 failing tests: > > > 292x: [empty] > > > 272x: > > ROOT_LOGROTATE_UNPRIVILEGED_USER_RotateWithSwitchUserTrueOrFalse > > > 136x: LOGROTATE_RotateInSandbox > > > 136x: LOGROTATE_CustomRotateOptions > > > 131x: ResourceStatistics > > > > > > > > > I don't really have a point with all of this, but some observations: > > > - [empty] means that the `mesos-tests` binary crashed > > > - The data also includes "real", i.e. non-flaky test failures, but > they > > > should not appear in the top 5 lists because we would hopefully either > > > revert or fix them before they can accumulate dozens of failures > > > - Over the whole year, we seem to be pretty good at fixing the > nastiest > > > flakes, with only one of the top 5 still appearing in this weeks test > > > results > > > - Sadly, the fail percentage isn't as different between now and then > as > > we > > > might have hoped. > > > > > > Hope this was interesting, and best regards, > > > -- > > > Benno Evers > > > Software Engineer, Mesosphere > > > > > > > > -- > Benno Evers > Software Engineer, Mesosphere >