> > the need for some external pressure to maintain build quality, and the > best solution proposed (to my mind) was the use of GitHub actions to > integrate with various CI services to refuse PRs that do not have a clean > test run
Honestly, I agree 100% with this. I took the more conservative approach (refine and standardize what we have + reduce friction) but I've long been a believer in intentionally setting up incentives and disincentives to shape behavior. So let me pose the question here to the list: is there anyone who would like to advocate for the current merge strategy (apply to oldest LTS, merge up, often -s ours w/new patch applied + amend) instead of "apply to trunk and cherry-pick back to LTS"? If we make this change we'll be able to integrate w/github actions and block merge on green CI + integrate git revert into our processes. On Tue, Dec 7, 2021 at 9:08 AM bened...@apache.org <bened...@apache.org> wrote: > > My personal opinion is we'd be well served to do trunk-based development > with cherry-picks … to LTS release branches > > Agreed. > > > that's somewhat orthogonal … to the primary thing this discussion > surfaced for me > > The primary outcome of the discussion for me was the need for some > external pressure to maintain build quality, and the best solution proposed > (to my mind) was the use of GitHub actions to integrate with various CI > services to refuse PRs that do not have a clean test run. This doesn’t > fully resolve flakiness, but it does provide 95%+ of the necessary pressure > to maintain test quality, and a consistent way of determining that. > > This is how a lot of other projects maintain correctness, and I think how > many forks of Cassandra are maintained outside of the project as well. > > From: Joshua McKenzie <jmcken...@apache.org> > Date: Tuesday, 7 December 2021 at 13:08 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] Releasable trunk and quality > > > > it would be far preferable for consistency of behaviour to rely on shared > > infrastructure if possible > > > For those of us using CircleCI, we can get a lot of the benefit by having a > script that rewrites and cleans up circle profiles based on use-case; it's > a shared / consistent environment and the scripting approach gives us > flexibility to support different workflows with minimal friction (build and > run every push vs. click to trigger for example). > > Is there a reason we discounted modifying the merge strategy? > > I took a stab at enumerating some of the current "best in class" I could > find here: > > https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y > . > My personal opinion is we'd be well served to do trunk-based development > with cherry-picks (and by that I mean basically re-applying) bugfixes back > to LTS release branches (or perhaps doing bugfix on oldest LTS and applying > up, tomato tomahto), doing away with merge commits, and using git revert > more liberally when a commit breaks CI or introduces instability into it. > > All that said, that's somewhat orthogonal (or perhaps complementary) to the > primary thing this discussion surfaced for me, which is that we don't have > standardization or guidance across what tests, on what JDK's, with what > config, etc that we run before commits today. My thinking is to get some > clarity for everyone on that front, reduce friction to encourage that > behavior, and then visit the merge strategy discussion independently after > that. > > ~Josh > > > > On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi <berenguerbl...@gmail.com> > wrote: > > > +1. I would add a 'post-commit' step: check the jenkins CI run for your > > merge and see if sthg broke regardless. > > > > On 6/12/21 23:51, Ekaterina Dimitrova wrote: > > > Hi Josh, > > > All good questions, thank you for raising this topic. > > > To the best of my knowledge, we don't have those documented but I will > > put > > > notes on what tribal knowledge I know about and I personally follow :-) > > > > > > Pre-commit test suites: * Which JDK's? - both are officially > supported > > so > > > both. > > > > > > * When to include all python tests or do JVM only (if ever)? - if I > test > > > only a test fix probably > > > > > > * When to run upgrade tests? - I haven't heard any definitive > guideline. > > > Preferably every time but if there is a tiny change I guess it can be > > > decided for them to be skipped. I would advocate to do more than less. > > > > > > * What to do if a test is also failing on the reference root (i.e. > trunk, > > > cassandra-4.0, etc)? - check if a ticket exists already, if not - open > > one > > > at least, even if I don't plan to work on it at least to acknowledge > > > the issue and add any info I know about. If we know who broke it, ping > > the > > > author to check it. > > > > > > * What to do if a test fails intermittently? - Open a ticket. During > > > investigation - Use the CircleCI jobs for running tests in a loop to > find > > > when it fails or to verify the test was fixed. (This is already in my > > draft > > > CircleCI document, not yet released as it was pending on the documents > > > migration.) > > > > > > Hope that helps. > > > > > > ~Ekaterina > > > > > > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jmcken...@apache.org> > > wrote: > > > > > >> As I work through the scripting on this, I don't know if we've > > documented > > >> or clarified the following (don't see it here: > > >> https://cassandra.apache.org/_/development/testing.html): > > >> > > >> Pre-commit test suites: > > >> * Which JDK's? > > >> * When to include all python tests or do JVM only (if ever)? > > >> * When to run upgrade tests? > > >> * What to do if a test is also failing on the reference root (i.e. > > trunk, > > >> cassandra-4.0, etc)? > > >> * What to do if a test fails intermittently? > > >> > > >> I'll also update the above linked documentation once we hammer this > out > > and > > >> try and bake it into the scripting flow as much as possible as well. > > Goal > > >> is to make it easy to do the right thing and hard to do the wrong > thing, > > >> and to have these things written down rather than have it be tribal > > >> knowledge that varies a lot across the project. > > >> > > >> ~Josh > > >> > > >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org> > > >> wrote: > > >> > > >>> After some offline collab, here's where this thread has landed on a > > >>> proposal to change our processes to incrementally improve our > processes > > >> and > > >>> hopefully stabilize the state of CI longer term: > > >>> > > >>> Link: > > >>> > > >> > > > https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4 > > >>> Hopefully the mail server doesn't butcher formatting; if it does, hit > > up > > >>> the gdoc and leave comments there as should be open to all. > > >>> > > >>> Phase 1: > > >>> Document merge criteria; update circle jobs to have a simple > pre-merge > > >> job > > >>> (one for each JDK profile) > > >>> * Donate, document, and formalize usage of circleci-enable.py in > > ASF > > >>> repo (need new commit scripts / dev tooling section?) > > >>> * rewrites circle config jobs to simple clear flow > > >>> * ability to toggle between "run on push" or "click to run" > > >>> * Variety of other functionality; see below > > >>> Document (site, help, README.md) and automate via scripting the > > >>> relationship / dev / release process around: > > >>> * In-jvm dtest > > >>> * dtest > > >>> * ccm > > >>> Integrate and document usage of script to build CI repeat test runs > > >>> * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest > > >>> * Document “Do this if you add or change tests” > > >>> Introduce “Build Lead” role > > >>> * Weekly rotation; volunteer > > >>> * 1: Make sure JIRAs exist for test failures > > >>> * 2: Attempt to triage new test failures to root cause and assign > > out > > >>> * 3: Coordinate and drive to green board on trunk > > >>> Change and automate process for *trunk only* patches: > > >>> * Block on green CI (from merge criteria in CI above; potentially > > >>> stricter definition of "clean" for trunk CI) > > >>> * Consider using github PR’s to merge (TODO: determine how to > > handle > > >>> circle + CHANGES; see below) > > >>> Automate process for *multi-branch* merges > > >>> * Harden / contribute / document dcapwell script (has one which > > does > > >>> the following): > > >>> * rebases your branch to the latest (if on 3.0 then rebase > > >> against > > >>> cassandra-3.0) > > >>> * check compiles > > >>> * removes all changes to .circle (can opt-out for circleci > > >> patches) > > >>> * removes all changes to CHANGES.txt and leverages JIRA for > the > > >>> content > > >>> * checks code still compiles > > >>> * changes circle to run ci > > >>> * push to a temp branch in git and run CI (circle + Jenkins) > > >>> * when all branches are clean (waiting step is manual) > > >>> * TODO: Define “clean” > > >>> * No new test failures compared to reference? > > >>> * Or no test failures at all? > > >>> * merge changes into the actual branches > > >>> * merge up changes; rewriting diff > > >>> * push --atomic > > >>> > > >>> Transition to phase 2 when: > > >>> * All items from phase 1 are complete > > >>> * Test boards for supported branches are green > > >>> > > >>> Phase 2: > > >>> * Add Harry to recurring run against trunk > > >>> * Add Harry to release pipeline > > >>> * Suite of perf tests against trunk recurring > > >>> > > >>> > > >>> > > >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie < > jmcken...@apache.org> > > >>> wrote: > > >>> > > >>>> Sorry for not catching that Benedict, you're absolutely right. So > long > > >> as > > >>>> we're using merge commits between branches I don't think > auto-merging > > >> via > > >>>> train or blocking on green CI are options via the tooling, and > > >> multi-branch > > >>>> reverts will be something we should document very clearly should we > > even > > >>>> choose to go that route (a lot of room to make mistakes there). > > >>>> > > >>>> It may not be a huge issue as we can expect the more disruptive > > changes > > >>>> (i.e. potentially destabilizing) to be happening on trunk only, so > > >> perhaps > > >>>> we can get away with slightly different workflows or policies based > on > > >>>> whether you're doing a multi-branch bugfix or a feature on trunk. > > Bears > > >>>> thinking more deeply about. > > >>>> > > >>>> I'd also be game for revisiting our merge strategy. I don't see much > > >>>> difference in labor between merging between branches vs. preparing > > >> separate > > >>>> patches for an individual developer, however I'm sure there's > > >> maintenance > > >>>> and integration implications there I'm not thinking of right now. > > >>>> > > >>>> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org < > > >> bened...@apache.org> > > >>>> wrote: > > >>>> > > >>>>> I raised this before, but to highlight it again: how do these > > >> approaches > > >>>>> interface with our merge strategy? > > >>>>> > > >>>>> We might have to rebase several dependent merge commits and want to > > >>>>> merge them atomically. So far as I know these tools don’t work > > >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic. > If > > >> not, > > >>>>> given how important these things are, should we consider revisiting > > our > > >>>>> merge strategy? > > >>>>> > > >>>>> From: Joshua McKenzie <jmcken...@apache.org> > > >>>>> Date: Wednesday, 17 November 2021 at 16:39 > > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality > > >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear > how > > >>>>> other > > >>>>> large complex infra projects have tackled this problem set. > > >>>>> > > >>>>> To attempt to summarize, what I got from your email: > > >>>>> [Phase one] > > >>>>> 1) Build Barons: rotation where there's always someone active tying > > >>>>> failures to changes and adding those failures to our ticketing > system > > >>>>> 2) Best effort process of "test breakers" being assigned tickets to > > fix > > >>>>> the > > >>>>> things their work broke > > >>>>> 3) Moving to a culture where we regularly revert commits that break > > >> tests > > >>>>> 4) Running tests before we merge changes > > >>>>> > > >>>>> [Phase two] > > >>>>> 1) Suite of performance tests on a regular cadence against trunk > > >>>>> (w/hunter > > >>>>> or otherwise) > > >>>>> 2) Integration w/ github merge-train pipelines > > >>>>> > > >>>>> That cover the highlights? I agree with these points as useful > places > > >> for > > >>>>> us to invest in as a project and I'll work on getting this into a > > gdoc > > >>>>> for > > >>>>> us to align on and discuss further this week. > > >>>>> > > >>>>> ~Josh > > >>>>> > > >>>>> > > >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo < > > henrik.i...@datastax.com > > >>>>> wrote: > > >>>>> > > >>>>>> There's an old joke: How many people read Slashdot? The answer is > 5. > > >>>>> The > > >>>>>> rest of us just write comments without reading... In that spirit, > I > > >>>>> wanted > > >>>>>> to share some thoughts in response to your question, even if I > know > > >>>>> some of > > >>>>>> it will have been said in this thread already :-) > > >>>>>> > > >>>>>> Basically, I just want to share what has worked well in my past > > >>>>> projects... > > >>>>>> Visualization: Now that we have Butler running, we can already > see a > > >>>>>> decline in failing tests for 4.0 and trunk! This shows that > > >>>>> contributors > > >>>>>> want to do the right thing, we just need the right tools and > > >> processes > > >>>>> to > > >>>>>> achieve success. > > >>>>>> > > >>>>>> Process: I'm confident we will soon be back to seeing 0 failures > for > > >>>>> 4.0 > > >>>>>> and trunk. However, keeping that state requires constant > vigilance! > > >> At > > >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...). > > >> This > > >>>>> is a > > >>>>>> weekly rotating role where the person who is the Build Baron will > at > > >>>>> least > > >>>>>> once per day go through all of the Butler dashboards to catch new > > >>>>>> regressions early. We have used the same process also at Datastax > to > > >>>>> guard > > >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of > the > > >>>>> Build > > >>>>>> Baron to > > >>>>>> - file a jira ticket for new failures > > >>>>>> - determine which commit is responsible for introducing the > > >>>>> regression. > > >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by > > >>>>> running > > >>>>>> more builds e.g. between two nightly builds. > > >>>>>> - assign the jira ticket to the author of the commit that > > introduced > > >>>>> the > > >>>>>> regression > > >>>>>> > > >>>>>> Given that Cassandra is a community that includes part time and > > >>>>> volunteer > > >>>>>> developers, we may want to try some variation of this, such as > > >> pairing > > >>>>> 2 > > >>>>>> build barons each week? > > >>>>>> > > >>>>>> Reverting: A policy that the commit causing the regression is > > >>>>> automatically > > >>>>>> reverted can be scary. It takes courage to be the junior test > > >> engineer > > >>>>> who > > >>>>>> reverts yesterday's commit from the founder and CTO, just to give > an > > >>>>>> example... Yet this is the most efficient way to keep the build > > >> green. > > >>>>> And > > >>>>>> it turns out it's not that much additional work for the original > > >>>>> author to > > >>>>>> fix the issue and then re-merge the patch. > > >>>>>> > > >>>>>> Merge-train: For any project with more than 1 commit per day, it > > will > > >>>>>> inevitably happen that you need to rebase a PR before merging, and > > >>>>> even if > > >>>>>> it passed all tests before, after rebase it won't. In the > downstream > > >>>>>> Cassandra fork previously mentioned, we have tried to enable a > > github > > >>>>> rule > > >>>>>> which requires a) that all tests passed before merging, and b) the > > PR > > >>>>> is > > >>>>>> against the head of the branch merged into, and c) the tests were > > run > > >>>>> after > > >>>>>> such rebase. Unfortunately this leads to infinite loops where a > > large > > >>>>> PR > > >>>>>> may never be able to commit because it has to be rebased again and > > >>>>> again > > >>>>>> when smaller PRs can merge faster. The solution to this problem is > > to > > >>>>> have > > >>>>>> an automated process for the rebase-test-merge cycle. Gitlab > > supports > > >>>>> such > > >>>>>> a feature and calls it merge-trean: > > >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html > > >>>>>> > > >>>>>> The merge-train can be considered an advanced feature and we can > > >>>>> return to > > >>>>>> it later. The other points should be sufficient to keep a > reasonably > > >>>>> green > > >>>>>> trunk. > > >>>>>> > > >>>>>> I guess the major area where we can improve daily test coverage > > would > > >>>>> be > > >>>>>> performance tests. To that end we recently open sourced a nice > tool > > >>>>> that > > >>>>>> can algorithmically detects performance regressions in a > timeseries > > >>>>> history > > >>>>>> of benchmark results: https://github.com/datastax-labs/hunter > Just > > >>>>> like > > >>>>>> with correctness testing it's my experience that catching > > regressions > > >>>>> the > > >>>>>> day they happened is much better than trying to do it at beta or > rc > > >>>>> time. > > >>>>>> Piotr also blogged about Hunter when it was released: > > >>>>>> > > >>>>>> > > >> > > > https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4 > > >>>>>> henrik > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie < > > >> jmcken...@apache.org> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> We as a project have gone back and forth on the topic of quality > > >> and > > >>>>> the > > >>>>>>> notion of a releasable trunk for quite a few years. If people are > > >>>>>>> interested, I'd like to rekindle this discussion a bit and see if > > >>>>> we're > > >>>>>>> happy with where we are as a project or if we think there's steps > > >> we > > >>>>>> should > > >>>>>>> take to change the quality bar going forward. The following > > >> questions > > >>>>>> have > > >>>>>>> been rattling around for me for awhile: > > >>>>>>> > > >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed > by > > >> M > > >>>>>>> committers? Passing N% of tests? Passing all tests plus some > other > > >>>>>> metrics > > >>>>>>> (manual testing, raising the number of reviewers, test coverage, > > >>>>> usage in > > >>>>>>> dev or QA environments, etc)? Something else entirely? > > >>>>>>> > > >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do > we > > >>>>> need > > >>>>>> to > > >>>>>>> take to get from where we are to having *and keeping* that > > >> releasable > > >>>>>>> trunk? Anything to codify there? > > >>>>>>> > > >>>>>>> 3. What are the benefits of having a releasable trunk as defined > > >>>>> here? > > >>>>>> What > > >>>>>>> are the costs? Is it worth pursuing? What are the alternatives > (for > > >>>>>>> instance: a freeze before a release + stabilization focus by the > > >>>>>> community > > >>>>>>> i.e. 4.0 push or the tock in tick-tock)? > > >>>>>>> > > >>>>>>> Given the large volumes of work coming down the pike with CEP's, > > >> this > > >>>>>> seems > > >>>>>>> like a good time to at least check in on this topic as a > community. > > >>>>>>> > > >>>>>>> Full disclosure: running face-first into 60+ failing tests on > trunk > > >>>>> when > > >>>>>>> going through the commit process for denylisting this week > brought > > >>>>> this > > >>>>>>> topic back up for me (reminds me of when I went to merge CDC back > > >> in > > >>>>> 3.6 > > >>>>>>> and those test failures riled me up... I sense a pattern ;)) > > >>>>>>> > > >>>>>>> Looking forward to hearing what people think. > > >>>>>>> > > >>>>>>> ~Josh > > >>>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> > > >>>>>> Henrik Ingo > > >>>>>> > > >>>>>> +358 40 569 7354 <358405697354> > > >>>>>> > > >>>>>> [image: Visit us online.] <https://www.datastax.com/> [image: > > Visit > > >>>>> us on > > >>>>>> Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on > > >>>>> YouTube.] > > >>>>>> < > > >>>>>> > > >> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= > > >>>>>> [image: Visit my LinkedIn profile.] < > > >>>>> https://www.linkedin.com/in/heingo/ > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >