Folks,

the CI state (both Apache and internal we have at Mesosphere) has recently
degraded to a point when people no longer look at it failures. This defeats
the primary purpose of a CI: to produce a reliable signal when a change
breaks something.

You might have seen a bunch of commits fixing flaky tests and bugs over the
past two weeks — this is the beginning of our effort to bring the CI back
to the green state. To track the effort, there exists a swim lane in our
tech debt board [1] and a flow diagram [2]. I believe that some of the
older tickets are no longer relevant, I will do a cleanup at some point
when I get a better feeling of the actual state.

If you would like to help, watch out for new flakiness new changes might
introduce. Apache CI apparently has a quirk when a test run can pause for
15+s, leading to arbitrary test failures. This is a false positive, but the
pattern is easily recognizabe in the logs.

We also have a dedicated channel in Apache Mesos slack: #ci-back-to-green

If you would like to participate, here is the list of the biggest offenders
that are not triaged yet: MESOS-7519, MESOS-7082, MESOS-7434, MESOS-7512,
MESOS-7742, MESOS-7028, MESOS-7425, MESOS-7106, MESOS-7337, MESOS-7273,
MESOS-6724, MESOS-8112, MESOS-6949, MESOS-8000, MESOS-8047

Alex.

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=151&view=detail&selectedIssue=MESOS-8005
[2]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=204&view=reporting&chart=cumulativeFlowDiagram&swimlane=501&column=774&column=775&column=776&days=7

Reply via email to