What if we tried the following: 1. Canonical CI for a release is ci-cassandra. We can optionally, and in practice will, run circle as well but don't codify blocking on that. 2. (NEW) We don't release unless we get a fully green run. 3. Before any merge, you need either a non-regressing (i.e. no new failures) run of circleci or of ci-cassandra. 3.a Non-regressing is defined here as "Doesn't introduce any new test failures; any new failures in CI are clearly not attributable to this diff" 4. (NEW) The Build Lead role + Butler catches and documents new intermittent failures; it's unspecified how we resource fixing those collectively at this time
2 raises the specter of flaky tests unique to apache infra greatly delaying releases. I can think of a few options to help keep us from regressing on ci-cassandra (numbered to indicate where they fit in / replace the flow above): 3: (NEW) Before merging tickets, block on a clean run of ci-cassandra (need something like merge trains; could automate merging, hard / impossible w/merge commits) 3: (NEW) Before merging tickets, run ci-cassandra and get an advisory update on the related JIRA (extra ci runtime burden; long delays w/out CI tests or infra optimization) 3.c: (NEW) After merging tickets, run ci-cassandra (already do this) and get an advisory update on the related JIRA for any new errors on the run of the SHA I strongly prefer we amend our process with 3.c. I'm pretty sure we could get granular enough to compute any new test failures and highlight them in the JIRA ticket and link to the run + the previous run. I believe this would greatly tighten the loop between a delta and a failure for a variety of tests, and 4 above would provide the fail-safe for us to catch and address flakes far earlier than the current model. ~Josh On Thu, Dec 16, 2021 at 1:20 PM Mick Semb Wever <m...@apache.org> wrote: > > > > > > > ci-cassandra.a.o needs to be our canonical CI > > > > it's the only one fully usable by a volunteer based > > > > > > only green in both counts as green > > > > I think today might just be my day to annoy you Mick. :D Sorry! > > > > > On the day I'm laid up in bed with a cold. > Go for gold :-) > > I think this is contradictory. We can't require circle to be green for a > > release if the free tier usage of it a) doesn't pass tests, and/or b) > > requires a license incompatible w/some contributors. That effectively > would > > make circle + asf ci our canonical ci, right? > > > > > That's taking it out (or twisting) my context a bit, let me explain… > > First, I did not mean the free tier. It is not usable AFAIK. It could be > updated so it was constrained in what it could run and was stable, but then > it's not complete so there's limited value here. IMHO plugging in GitHub > Actions to do a very basic build+test would hit a larger newcomer audience. > > Second, I didn't mean one *had* to run both. Just like post-commit will > catch things, just so long as that breakage comes around to you and you > accept your involvement in it. We (the whole community) need to help out > when the author cannot reproduce/debug the failure, and this isn't just > limited to premium circleci. > > > less flakies than the previous release > > > > This statement makes me wary. :) Why not "no test failures"? > > > > > More than happy to go for that. And I damn hope we are there for our next > major release. > This statement was more just a preference to lean on the more pragmatic > side. We know our north star, keep moving towards it. >