Re: [DISCUSS] Releasable trunk and quality

Joshua McKenzie Tue, 07 Dec 2021 06:18:16 -0800

>
> the need for some external pressure to maintain build quality, and the
> best solution proposed (to my mind) was the use of GitHub actions to
> integrate with various CI services to refuse PRs that do not have a clean
> test run


Honestly, I agree 100% with this. I took the more conservative approach
(refine and standardize what we have + reduce friction) but I've long been
a believer in intentionally setting up incentives and disincentives to
shape behavior.

So let me pose the question here to the list: is there anyone who would
like to advocate for the current merge strategy (apply to oldest LTS, merge
up, often -s ours w/new patch applied + amend) instead of "apply to trunk
and cherry-pick back to LTS"? If we make this change we'll be able to
integrate w/github actions and block merge on green CI + integrate git
revert into our processes.

On Tue, Dec 7, 2021 at 9:08 AM bened...@apache.org <bened...@apache.org>
wrote:

> > My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks … to LTS release branches
>
> Agreed.
>
> > that's somewhat orthogonal … to the primary thing this discussion
> surfaced for me
>
> The primary outcome of the discussion for me was the need for some
> external pressure to maintain build quality, and the best solution proposed
> (to my mind) was the use of GitHub actions to integrate with various CI
> services to refuse PRs that do not have a clean test run. This doesn’t
> fully resolve flakiness, but it does provide 95%+ of the necessary pressure
> to maintain test quality, and a consistent way of determining that.
>
> This is how a lot of other projects maintain correctness, and I think how
> many forks of Cassandra are maintained outside of the project as well.
>
> From: Joshua McKenzie <jmcken...@apache.org>
> Date: Tuesday, 7 December 2021 at 13:08
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> >
> > it would be far preferable for consistency of behaviour to rely on shared
> > infrastructure if possible
> >
> For those of us using CircleCI, we can get a lot of the benefit by having a
> script that rewrites and cleans up circle profiles based on use-case; it's
> a shared / consistent environment and the scripting approach gives us
> flexibility to support different workflows with minimal friction (build and
> run every push vs. click to trigger for example).
>
> Is there a reason we discounted modifying the merge strategy?
>
> I took a stab at enumerating some of the current "best in class" I could
> find here:
>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y
> .
> My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks (and by that I mean basically re-applying) bugfixes back
> to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
> up, tomato tomahto), doing away with merge commits, and using git revert
> more liberally when a commit breaks CI or introduces instability into it.
>
> All that said, that's somewhat orthogonal (or perhaps complementary) to the
> primary thing this discussion surfaced for me, which is that we don't have
> standardization or guidance across what tests, on what JDK's, with what
> config, etc that we run before commits today. My thinking is to get some
> clarity for everyone on that front, reduce friction to encourage that
> behavior, and then visit the merge strategy discussion independently after
> that.
>
> ~Josh
>
>
>
> On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi <berenguerbl...@gmail.com>
> wrote:
>
> > +1. I would add a 'post-commit' step: check the jenkins CI run for your
> > merge and see if sthg broke regardless.
> >
> > On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > > Hi Josh,
> > > All good questions, thank you for raising this topic.
> > > To the best of my knowledge, we don't have those documented but I will
> > put
> > > notes on what tribal knowledge I know about and I personally follow :-)
> > >
> > >  Pre-commit test suites: * Which JDK's?  - both are officially
> supported
> > so
> > > both.
> > >
> > > * When to include all python tests or do JVM only (if ever)? - if I
> test
> > > only a test fix probably
> > >
> > >  * When to run upgrade tests? - I haven't heard any definitive
> guideline.
> > > Preferably every time but if there is a tiny change I guess it can be
> > > decided for them to be skipped. I would advocate to do more than less.
> > >
> > > * What to do if a test is also failing on the reference root (i.e.
> trunk,
> > > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> > one
> > > at least, even if I don't plan to work on it at least to acknowledge
> > > the issue and add any info I know about. If we know who broke it, ping
> > the
> > > author to check it.
> > >
> > > * What to do if a test fails intermittently? - Open a ticket. During
> > > investigation - Use the CircleCI jobs for running tests in a loop to
> find
> > > when it fails or to verify the test was fixed. (This is already in my
> > draft
> > > CircleCI document, not yet released as it was pending on the documents
> > > migration.)
> > >
> > > Hope that helps.
> > >
> > > ~Ekaterina
> > >
> > > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jmcken...@apache.org>
> > wrote:
> > >
> > >> As I work through the scripting on this, I don't know if we've
> > documented
> > >> or clarified the following (don't see it here:
> > >> https://cassandra.apache.org/_/development/testing.html):
> > >>
> > >> Pre-commit test suites:
> > >> * Which JDK's?
> > >> * When to include all python tests or do JVM only (if ever)?
> > >> * When to run upgrade tests?
> > >> * What to do if a test is also failing on the reference root (i.e.
> > trunk,
> > >> cassandra-4.0, etc)?
> > >> * What to do if a test fails intermittently?
> > >>
> > >> I'll also update the above linked documentation once we hammer this
> out
> > and
> > >> try and bake it into the scripting flow as much as possible as well.
> > Goal
> > >> is to make it easy to do the right thing and hard to do the wrong
> thing,
> > >> and to have these things written down rather than have it be tribal
> > >> knowledge that varies a lot across the project.
> > >>
> > >> ~Josh
> > >>
> > >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org>
> > >> wrote:
> > >>
> > >>> After some offline collab, here's where this thread has landed on a
> > >>> proposal to change our processes to incrementally improve our
> processes
> > >> and
> > >>> hopefully stabilize the state of CI longer term:
> > >>>
> > >>> Link:
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> > >>> Hopefully the mail server doesn't butcher formatting; if it does, hit
> > up
> > >>> the gdoc and leave comments there as should be open to all.
> > >>>
> > >>> Phase 1:
> > >>> Document merge criteria; update circle jobs to have a simple
> pre-merge
> > >> job
> > >>> (one for each JDK profile)
> > >>>      * Donate, document, and formalize usage of circleci-enable.py in
> > ASF
> > >>> repo (need new commit scripts / dev tooling section?)
> > >>>         * rewrites circle config jobs to simple clear flow
> > >>>         * ability to toggle between "run on push" or "click to run"
> > >>>         * Variety of other functionality; see below
> > >>> Document (site, help, README.md) and automate via scripting the
> > >>> relationship / dev / release process around:
> > >>>     * In-jvm dtest
> > >>>     * dtest
> > >>>     * ccm
> > >>> Integrate and document usage of script to build CI repeat test runs
> > >>>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> > >>>     * Document “Do this if you add or change tests”
> > >>> Introduce “Build Lead” role
> > >>>     * Weekly rotation; volunteer
> > >>>     * 1: Make sure JIRAs exist for test failures
> > >>>     * 2: Attempt to triage new test failures to root cause and assign
> > out
> > >>>     * 3: Coordinate and drive to green board on trunk
> > >>> Change and automate process for *trunk only* patches:
> > >>>     * Block on green CI (from merge criteria in CI above; potentially
> > >>> stricter definition of "clean" for trunk CI)
> > >>>     * Consider using github PR’s to merge (TODO: determine how to
> > handle
> > >>> circle + CHANGES; see below)
> > >>> Automate process for *multi-branch* merges
> > >>>     * Harden / contribute / document dcapwell script (has one which
> > does
> > >>> the following):
> > >>>         * rebases your branch to the latest (if on 3.0 then rebase
> > >> against
> > >>> cassandra-3.0)
> > >>>         * check compiles
> > >>>         * removes all changes to .circle (can opt-out for circleci
> > >> patches)
> > >>>         * removes all changes to CHANGES.txt and leverages JIRA for
> the
> > >>> content
> > >>>         * checks code still compiles
> > >>>         * changes circle to run ci
> > >>>         * push to a temp branch in git and run CI (circle + Jenkins)
> > >>>             * when all branches are clean (waiting step is manual)
> > >>>             * TODO: Define “clean”
> > >>>                 * No new test failures compared to reference?
> > >>>                 * Or no test failures at all?
> > >>>             * merge changes into the actual branches
> > >>>             * merge up changes; rewriting diff
> > >>>             * push --atomic
> > >>>
> > >>> Transition to phase 2 when:
> > >>>     * All items from phase 1 are complete
> > >>>     * Test boards for supported branches are green
> > >>>
> > >>> Phase 2:
> > >>> * Add Harry to recurring run against trunk
> > >>> * Add Harry to release pipeline
> > >>> * Suite of perf tests against trunk recurring
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <
> jmcken...@apache.org>
> > >>> wrote:
> > >>>
> > >>>> Sorry for not catching that Benedict, you're absolutely right. So
> long
> > >> as
> > >>>> we're using merge commits between branches I don't think
> auto-merging
> > >> via
> > >>>> train or blocking on green CI are options via the tooling, and
> > >> multi-branch
> > >>>> reverts will be something we should document very clearly should we
> > even
> > >>>> choose to go that route (a lot of room to make mistakes there).
> > >>>>
> > >>>> It may not be a huge issue as we can expect the more disruptive
> > changes
> > >>>> (i.e. potentially destabilizing) to be happening on trunk only, so
> > >> perhaps
> > >>>> we can get away with slightly different workflows or policies based
> on
> > >>>> whether you're doing a multi-branch bugfix or a feature on trunk.
> > Bears
> > >>>> thinking more deeply about.
> > >>>>
> > >>>> I'd also be game for revisiting our merge strategy. I don't see much
> > >>>> difference in labor between merging between branches vs. preparing
> > >> separate
> > >>>> patches for an individual developer, however I'm sure there's
> > >> maintenance
> > >>>> and integration implications there I'm not thinking of right now.
> > >>>>
> > >>>> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org <
> > >> bened...@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> I raised this before, but to highlight it again: how do these
> > >> approaches
> > >>>>> interface with our merge strategy?
> > >>>>>
> > >>>>> We might have to rebase several dependent merge commits and want to
> > >>>>> merge them atomically. So far as I know these tools don’t work
> > >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic.
> If
> > >> not,
> > >>>>> given how important these things are, should we consider revisiting
> > our
> > >>>>> merge strategy?
> > >>>>>
> > >>>>> From: Joshua McKenzie <jmcken...@apache.org>
> > >>>>> Date: Wednesday, 17 November 2021 at 16:39
> > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality
> > >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear
> how
> > >>>>> other
> > >>>>> large complex infra projects have tackled this problem set.
> > >>>>>
> > >>>>> To attempt to summarize, what I got from your email:
> > >>>>> [Phase one]
> > >>>>> 1) Build Barons: rotation where there's always someone active tying
> > >>>>> failures to changes and adding those failures to our ticketing
> system
> > >>>>> 2) Best effort process of "test breakers" being assigned tickets to
> > fix
> > >>>>> the
> > >>>>> things their work broke
> > >>>>> 3) Moving to a culture where we regularly revert commits that break
> > >> tests
> > >>>>> 4) Running tests before we merge changes
> > >>>>>
> > >>>>> [Phase two]
> > >>>>> 1) Suite of performance tests on a regular cadence against trunk
> > >>>>> (w/hunter
> > >>>>> or otherwise)
> > >>>>> 2) Integration w/ github merge-train pipelines
> > >>>>>
> > >>>>> That cover the highlights? I agree with these points as useful
> places
> > >> for
> > >>>>> us to invest in as a project and I'll work on getting this into a
> > gdoc
> > >>>>> for
> > >>>>> us to align on and discuss further this week.
> > >>>>>
> > >>>>> ~Josh
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
> > henrik.i...@datastax.com
> > >>>>> wrote:
> > >>>>>
> > >>>>>> There's an old joke: How many people read Slashdot? The answer is
> 5.
> > >>>>> The
> > >>>>>> rest of us just write comments without reading... In that spirit,
> I
> > >>>>> wanted
> > >>>>>> to share some thoughts in response to your question, even if I
> know
> > >>>>> some of
> > >>>>>> it will have been said in this thread already :-)
> > >>>>>>
> > >>>>>> Basically, I just want to share what has worked well in my past
> > >>>>> projects...
> > >>>>>> Visualization: Now that we have Butler running, we can already
> see a
> > >>>>>> decline in failing tests for 4.0 and trunk! This shows that
> > >>>>> contributors
> > >>>>>> want to do the right thing, we just need the right tools and
> > >> processes
> > >>>>> to
> > >>>>>> achieve success.
> > >>>>>>
> > >>>>>> Process: I'm confident we will soon be back to seeing 0 failures
> for
> > >>>>> 4.0
> > >>>>>> and trunk. However, keeping that state requires constant
> vigilance!
> > >> At
> > >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> > >> This
> > >>>>> is a
> > >>>>>> weekly rotating role where the person who is the Build Baron will
> at
> > >>>>> least
> > >>>>>> once per day go through all of the Butler dashboards to catch new
> > >>>>>> regressions early. We have used the same process also at Datastax
> to
> > >>>>> guard
> > >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of
> the
> > >>>>> Build
> > >>>>>> Baron to
> > >>>>>>  - file a jira ticket for new failures
> > >>>>>>  - determine which commit is responsible for introducing the
> > >>>>> regression.
> > >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by
> > >>>>> running
> > >>>>>> more builds e.g. between two nightly builds.
> > >>>>>>  - assign the jira ticket to the author of the commit that
> > introduced
> > >>>>> the
> > >>>>>> regression
> > >>>>>>
> > >>>>>> Given that Cassandra is a community that includes part time and
> > >>>>> volunteer
> > >>>>>> developers, we may want to try some variation of this, such as
> > >> pairing
> > >>>>> 2
> > >>>>>> build barons each week?
> > >>>>>>
> > >>>>>> Reverting: A policy that the commit causing the regression is
> > >>>>> automatically
> > >>>>>> reverted can be scary. It takes courage to be the junior test
> > >> engineer
> > >>>>> who
> > >>>>>> reverts yesterday's commit from the founder and CTO, just to give
> an
> > >>>>>> example... Yet this is the most efficient way to keep the build
> > >> green.
> > >>>>> And
> > >>>>>> it turns out it's not that much additional work for the original
> > >>>>> author to
> > >>>>>> fix the issue and then re-merge the patch.
> > >>>>>>
> > >>>>>> Merge-train: For any project with more than 1 commit per day, it
> > will
> > >>>>>> inevitably happen that you need to rebase a PR before merging, and
> > >>>>> even if
> > >>>>>> it passed all tests before, after rebase it won't. In the
> downstream
> > >>>>>> Cassandra fork previously mentioned, we have tried to enable a
> > github
> > >>>>> rule
> > >>>>>> which requires a) that all tests passed before merging, and b) the
> > PR
> > >>>>> is
> > >>>>>> against the head of the branch merged into, and c) the tests were
> > run
> > >>>>> after
> > >>>>>> such rebase. Unfortunately this leads to infinite loops where a
> > large
> > >>>>> PR
> > >>>>>> may never be able to commit because it has to be rebased again and
> > >>>>> again
> > >>>>>> when smaller PRs can merge faster. The solution to this problem is
> > to
> > >>>>> have
> > >>>>>> an automated process for the rebase-test-merge cycle. Gitlab
> > supports
> > >>>>> such
> > >>>>>> a feature and calls it merge-trean:
> > >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> > >>>>>>
> > >>>>>> The merge-train can be considered an advanced feature and we can
> > >>>>> return to
> > >>>>>> it later. The other points should be sufficient to keep a
> reasonably
> > >>>>> green
> > >>>>>> trunk.
> > >>>>>>
> > >>>>>> I guess the major area where we can improve daily test coverage
> > would
> > >>>>> be
> > >>>>>> performance tests. To that end we recently open sourced a nice
> tool
> > >>>>> that
> > >>>>>> can algorithmically detects performance regressions in a
> timeseries
> > >>>>> history
> > >>>>>> of benchmark results: https://github.com/datastax-labs/hunter
> Just
> > >>>>> like
> > >>>>>> with correctness testing it's my experience that catching
> > regressions
> > >>>>> the
> > >>>>>> day they happened is much better than trying to do it at beta or
> rc
> > >>>>> time.
> > >>>>>> Piotr also blogged about Hunter when it was released:
> > >>>>>>
> > >>>>>>
> > >>
> >
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> > >>>>>> henrik
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> > >> jmcken...@apache.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> We as a project have gone back and forth on the topic of quality
> > >> and
> > >>>>> the
> > >>>>>>> notion of a releasable trunk for quite a few years. If people are
> > >>>>>>> interested, I'd like to rekindle this discussion a bit and see if
> > >>>>> we're
> > >>>>>>> happy with where we are as a project or if we think there's steps
> > >> we
> > >>>>>> should
> > >>>>>>> take to change the quality bar going forward. The following
> > >> questions
> > >>>>>> have
> > >>>>>>> been rattling around for me for awhile:
> > >>>>>>>
> > >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed
> by
> > >> M
> > >>>>>>> committers? Passing N% of tests? Passing all tests plus some
> other
> > >>>>>> metrics
> > >>>>>>> (manual testing, raising the number of reviewers, test coverage,
> > >>>>> usage in
> > >>>>>>> dev or QA environments, etc)? Something else entirely?
> > >>>>>>>
> > >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do
> we
> > >>>>> need
> > >>>>>> to
> > >>>>>>> take to get from where we are to having *and keeping* that
> > >> releasable
> > >>>>>>> trunk? Anything to codify there?
> > >>>>>>>
> > >>>>>>> 3. What are the benefits of having a releasable trunk as defined
> > >>>>> here?
> > >>>>>> What
> > >>>>>>> are the costs? Is it worth pursuing? What are the alternatives
> (for
> > >>>>>>> instance: a freeze before a release + stabilization focus by the
> > >>>>>> community
> > >>>>>>> i.e. 4.0 push or the tock in tick-tock)?
> > >>>>>>>
> > >>>>>>> Given the large volumes of work coming down the pike with CEP's,
> > >> this
> > >>>>>> seems
> > >>>>>>> like a good time to at least check in on this topic as a
> community.
> > >>>>>>>
> > >>>>>>> Full disclosure: running face-first into 60+ failing tests on
> trunk
> > >>>>> when
> > >>>>>>> going through the commit process for denylisting this week
> brought
> > >>>>> this
> > >>>>>>> topic back up for me (reminds me of when I went to merge CDC back
> > >> in
> > >>>>> 3.6
> > >>>>>>> and those test failures riled me up... I sense a pattern ;))
> > >>>>>>>
> > >>>>>>> Looking forward to hearing what people think.
> > >>>>>>>
> > >>>>>>> ~Josh
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>>
> > >>>>>> Henrik Ingo
> > >>>>>>
> > >>>>>> +358 40 569 7354 <358405697354>
> > >>>>>>
> > >>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
> > Visit
> > >>>>> us on
> > >>>>>> Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> > >>>>> YouTube.]
> > >>>>>> <
> > >>>>>>
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> > >>>>>>   [image: Visit my LinkedIn profile.] <
> > >>>>> https://www.linkedin.com/in/heingo/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Reply via email to