What if we tried the following:

1. Canonical CI for a release is ci-cassandra. We can optionally, and in
practice will, run circle as well but don't codify blocking on that.
2. (NEW) We don't release unless we get a fully green run.
3. Before any merge, you need either a non-regressing (i.e. no new
failures) run of circleci or of ci-cassandra.
     3.a Non-regressing is defined here as "Doesn't introduce any new test
failures; any new failures in CI are clearly not attributable to this diff"
4. (NEW) The Build Lead role + Butler catches and documents new
intermittent failures; it's unspecified how we resource fixing those
collectively at this time

2 raises the specter of flaky tests unique to apache infra greatly delaying
releases. I can think of a few options to help keep us from regressing on
ci-cassandra (numbered to indicate where they fit in / replace the flow
above):

3: (NEW) Before merging tickets, block on a clean run of ci-cassandra (need
something like merge trains; could automate merging, hard / impossible
w/merge commits)
3: (NEW) Before merging tickets, run ci-cassandra and get an advisory
update on the related JIRA (extra ci runtime burden; long delays w/out CI
tests or infra optimization)
3.c: (NEW) After merging tickets, run ci-cassandra (already do this) and
get an advisory update on the related JIRA for any new errors on the run of
the SHA

I strongly prefer we amend our process with 3.c. I'm pretty sure we could
get granular enough to compute any new test failures and highlight them in
the JIRA ticket and link to the run + the previous run. I believe this
would greatly tighten the loop between a delta and a failure for a variety
of tests, and 4 above would provide the fail-safe for us to catch and
address flakes far earlier than the current model.

~Josh

On Thu, Dec 16, 2021 at 1:20 PM Mick Semb Wever <m...@apache.org> wrote:

> >
> >
> > > ci-cassandra.a.o needs to be our canonical CI
> >
> > it's the only one fully usable by a volunteer based
> >
> >
> > only green in both counts as green
> >
> > I think today might just be my day to annoy you Mick. :D Sorry!
> >
>
>
> On the day I'm laid up in bed with a cold.
> Go for gold :-)
>
> I think this is contradictory. We can't require circle to be green for a
> > release if the free tier usage of it a) doesn't pass tests, and/or b)
> > requires a license incompatible w/some contributors. That effectively
> would
> > make circle + asf ci our canonical ci, right?
> >
>
>
> That's taking it out (or twisting) my context a bit, let me explain…
>
> First, I did not mean the free tier. It is not usable AFAIK. It could be
> updated so it was constrained in what it could run and was stable, but then
> it's not complete so there's limited value here. IMHO plugging in GitHub
> Actions to do a very basic build+test would hit a larger newcomer audience.
>
> Second, I didn't mean one *had* to run both. Just like post-commit will
> catch things, just so long as that breakage comes around to you and you
> accept your involvement in it. We (the whole community) need to help out
> when the author cannot reproduce/debug the failure, and this isn't just
> limited to premium circleci.
>
>
> less flakies than the previous release
> >
> > This statement makes me wary. :) Why not "no test failures"?
> >
>
>
> More than happy to go for that. And I damn hope we are there for our next
> major release.
> This statement was more just a preference to lean on the more pragmatic
> side. We know our north star, keep moving towards it.
>

Reply via email to