Folks, the CI state (both Apache and internal we have at Mesosphere) has recently degraded to a point when people no longer look at it failures. This defeats the primary purpose of a CI: to produce a reliable signal when a change breaks something.
You might have seen a bunch of commits fixing flaky tests and bugs over the past two weeks — this is the beginning of our effort to bring the CI back to the green state. To track the effort, there exists a swim lane in our tech debt board [1] and a flow diagram [2]. I believe that some of the older tickets are no longer relevant, I will do a cleanup at some point when I get a better feeling of the actual state. If you would like to help, watch out for new flakiness new changes might introduce. Apache CI apparently has a quirk when a test run can pause for 15+s, leading to arbitrary test failures. This is a false positive, but the pattern is easily recognizabe in the logs. We also have a dedicated channel in Apache Mesos slack: #ci-back-to-green If you would like to participate, here is the list of the biggest offenders that are not triaged yet: MESOS-7519, MESOS-7082, MESOS-7434, MESOS-7512, MESOS-7742, MESOS-7028, MESOS-7425, MESOS-7106, MESOS-7337, MESOS-7273, MESOS-6724, MESOS-8112, MESOS-6949, MESOS-8000, MESOS-8047 Alex. [1] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=151&view=detail&selectedIssue=MESOS-8005 [2] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=204&view=reporting&chart=cumulativeFlowDiagram&swimlane=501&column=774&column=775&column=776&days=7