This is great. Thanks Benno for sharing!

What did you use to do the analysis? I would love it if we can have graphs
that we can run on TVs.

On Mon, Oct 15, 2018 at 5:23 AM Benno Evers <bev...@mesosphere.com> wrote:

> > Is there any reason the first portion of the test name is being
> truncated?
>
> There is, although it is slightly embarrassing: We currently only store the
> detailed data including full test case name and platform
> for about a week, for anything older than that the abridged version is the
> best I could find. The data should still be good, though,
> since we hopefully don't have two tests with the same name that are both
> frequently flaky.
>
> In particular, the ResourceStatistics refers to the
> 'MesosContainerizerSlaveRecoveryTest.ResourceStatistics' test tracked
> in MESOS-5048.
>
> On Fri, Oct 12, 2018 at 7:03 PM Benjamin Mahler <bmah...@apache.org>
> wrote:
>
> > Thanks for sending this Benno! I for one would love to see more regular
> > communication about the state of CI, especially so that I know how I can
> > help fix tests (right now I don't know which flaky tests are in areas I
> am
> > maintaining).
> >
> > Is there any reason the first portion of the test name is being
> truncated?
> > For example, ResourceStatistics matches several tests:
> >
> > $ grep -R ' ResourceStatistics)' src/tests
> > src/tests/containerizer/xfs_quota_tests.cpp:TEST_F(ROOT_XFS_QuotaTest,
> > ResourceStatistics)
> >
> >
> src/tests/slave_recovery_tests.cpp:TEST_F(MesosContainerizerSlaveRecoveryTest,
> > ResourceStatistics)
> > src/tests/disk_quota_tests.cpp:TEST_F(DiskQuotaTest, ResourceStatistics)
> >
> > Did we actually fix the flaky tests or did we disable them? I see only 22
> > disabled tests, which is better than I expected, but I hope there's good
> > tracking on getting these un-disabled again:
> >
> > $ grep -R DISABLED src/tests | grep -v DISABLED_ON_WINDOWS | grep -v
> > NestedQuota | grep -v ChildRole | grep -v NestedRoles | grep -v
> > environment.cpp | wc -l
> >       22
> >
> > On Fri, Oct 12, 2018 at 7:38 AM Benno Evers <bev...@mesosphere.com>
> wrote:
> >
> > > Hey all,
> > >
> > > as you might know, we've set up an internal CI system that is running
> > `make
> > > check` on a variety of different platforms and configurations, 16 in
> > total.
> > >
> > > As we've experienced more and more pain maintaining a green master,
> I've
> > > compiled some statistics about which tests are most flaky. I thought
> > other
> > > people might also be interested to have a look at that data:
> > >
> > > Last Week:
> > >
> > >     # CI Statistics since 2018-10-05 14:22:35.422882 for branches
> > > containing 'asf/master'
> > >     Total: 41 failing tests, 28 unique. (avg 0.142361111111 failing
> tests
> > > per build)
> > >
> > >     Top 5 failing tests:
> > >     6x: [empty]
> > >     4x: ResourceStatistics
> > >     2x: CreateDestroyDiskRecovery
> > >     2x: INTERNET_CURL_InvokeFetchByName
> > >     2x: RecoverNestedContainer
> > >
> > > Last Month:
> > >
> > >     # CI Statistics since 2018-09-12 14:23:36.272031 for branches
> > > containing 'asf/master'
> > >     Total: 320 failing tests, 75 unique. (avg 0.285714285714 failing
> > tests
> > > per build)
> > >
> > >     Top 5 failing tests:
> > >     57x: Used
> > >     32x: LongLivedDefaultExecutorRestart
> > >     27x: PythonFramework
> > >     23x: ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel
> > >     22x: ResourceStatistics
> > >
> > > Last year:
> > >
> > >     # CI Statistics since 2017-10-12 14:24:31.639792 for branches
> > > containing 'asf/master'
> > >     Total: 3045 failing tests, 225 unique. (avg 0.184054642166 failing
> > > tests per build)
> > >
> > >     Top 5 failing tests:
> > >     292x: [empty]
> > >     272x:
> > ROOT_LOGROTATE_UNPRIVILEGED_USER_RotateWithSwitchUserTrueOrFalse
> > >     136x: LOGROTATE_RotateInSandbox
> > >     136x: LOGROTATE_CustomRotateOptions
> > >     131x: ResourceStatistics
> > >
> > >
> > > I don't really have a point with all of this, but some observations:
> > >  - [empty] means that the `mesos-tests` binary crashed
> > >  - The data also includes "real", i.e. non-flaky test failures, but
> they
> > > should not appear in the top 5 lists because we would hopefully either
> > > revert or fix them before they can accumulate dozens of failures
> > >  - Over the whole year, we seem to be pretty good at fixing  the
> nastiest
> > > flakes, with only one of the top 5 still appearing in this weeks test
> > > results
> > >  - Sadly, the fail percentage isn't as different between now and then
> as
> > we
> > > might have hoped.
> > >
> > > Hope this was interesting, and best regards,
> > > --
> > > Benno Evers
> > > Software Engineer, Mesosphere
> > >
> >
>
>
> --
> Benno Evers
> Software Engineer, Mesosphere
>

Reply via email to