So to clarify it all in one place, the proposed new CI process we should
test for consensus is:

1. Canonical CI for a release is ci-cassandra. We can optionally, and in
practice will, run circle as well but don't codify blocking on that.
2. (NEW) We don't release unless we get a fully green run.
3. Before any merge, you need either a non-regressing (i.e. no new
failures) run of circleci with a (specific suite of tests TBD) or of
ci-cassandra.
     3.a Non-regressing is defined here as "Doesn't introduce any new test
failures; any new failures in CI are clearly not attributable to this diff"
     3.b: (NEW) After merging tickets, ci-cassandra runs against the SHA
and the author gets an advisory update on the related JIRA for any new
errors on CI. The author of the ticket will take point on triaging this new
failure and either fixing (if clearly reproducible or related to their
work) or opening a JIRA for the intermittent failure and linking it in
butler (https://butler.cassandra.apache.org/#/)
4. (NEW) The Build Lead role + Butler catches and documents all failures
and anything that slips through the procedural cracks in 3.b; resourcing
for fixing flakey tests TBD

Our two TBD we can tackle separately from consensus on the above:
1. Suite of tests on circle required to be considered ready for merge
2. How we resource fixing flakey tests that are functionally impossible to
attribute without essentially fixing the flake

On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova <e.dimitr...@gmail.com>
wrote:

> +1 (nb) on my end too, I second Mick
> Thanks for putting this together Josh
>
> On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever <m...@apache.org> wrote:
>
> > >
> > >
> > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do this)
> and
> > > get an advisory update on the related JIRA for any new errors on the
> run
> > of
> > > the SHA
> > >
> > > I strongly prefer we amend our process with 3.c.
> >
> >
> >
> > +1   Yup, this is the most important missing piece for me.
> >
> > I also wouldn't mind we word the responsibility of the author at
> > post-commit fault to be involved/leading in the fix. This incentivises
> > people to do 2+3 properly, and not push it onto the build role.
> >
>

Reply via email to