💯! Amazing work - thanks so much for posting the details, Mick, and Josh is right on. Kinda bummed I haven't been following C* CI dev, being more on the ops side lately. Posting this up has me intrigued, so I may just have to go poke around some and scratch an itch :)
Warm regards, Michael On Sun, Apr 28, 2024 at 9:08 PM Josh McKenzie <jmcken...@apache.org> wrote: > A huge amount of work and time went into this and it's going to have a big > impact on the project. I want to offer a heartfelt thanks to all involved > for the focus and energy that went into this! > > As the author of the system David lovingly dubbed "JoshCI" (/sigh), I > definitely want to see us all move to converge as much as possible on the > CI code we're running. While I remain convinced something like > CASSANDRA-18731 is vital for hygiene in the long run (unit testing our CI, > declaratively defining atoms of build logic independently from flow), I > also think there'd be significant value in more of us moving towards using > the JenkinsFile where at all possible. > > Seriously - thanks again for all this work everyone. CI on Cassandra is a > Big Data Problem, and not an easy one. > > On Sun, Apr 28, 2024, at 10:22 AM, Mick Semb Wever wrote: > > > Good news. > > CI on 5.0 and trunk is working again, after an unexpected 6 weeks > hiatus (and a string of general problems since last year). > This includes pre-commit for 5.0 and trunk working again. > > > More info… > > From 5.0 we now have in-tree a Jenkinsfile that only relies on the in-tree > scripts – it does not depend upon cassandra-builds and all the individual > dsl created stage jobs. This aligns how pre-commit and post-commit works. > More importantly, it makes our CI repeatable regardless of the fork/branch > of the code, or the jenkins installation. > > For 5.0+ pre-commit use the Cassandra-devbranch-5 and make sure your patch > is after sha 3c85def > The jenkinsfile now comes with pre-defined profiles, it's recommended to > use "skinny" until you need the final "pre-commit". You can also use the > custom profile with a regexp when you need just specific test types. > See https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/build > > For pre-commit on older branches, you now use Cassandra-devbranch-before-5 > > For both pre- and post-commit builds, each build now creates two new > sharable artefacts: ci_summary.html and results_details.tar.xz > These are based on what apple contributors were sharing from builds from > their internal CI system. The format and contents of these files is > expected to evolve. > > Each build now archives its results and logs all under one location in > nightlies. > > e.g. https://nightlies.apache.org/cassandra/Cassandra-5.0/227/ > > > > The post-commit pipeline profile remains *very* heavy, at 130k+ tests. > These were previously ramped up to include everything in their pipelines, > given everything that's happening in both branches. So they take time and > saturate everything they touch. We need to re-evaluate what we need to be > testing to alleviate this. There'll also be a new pattern of timeouts and > infra/script -related flakies, as happens whenever there's such a > significant change, all the patience and help possible is appreciated! > > > > Now that the jenkinsfile can now be used on any jenkins server for any > fork/branch, the next work-in-progress is CASSANDRA-18145, to be able to > run the full pipeline with a single command line (given a k8s context > (~/.kube/config)). > > We already have most of this working – it's possible to make a clone > ci-cassandra.apache.org on k8s using this wip helm chart: > https://github.com/thelastpickle/Cassius > And we are already using this on an auto-scaling gke k8s cluster – you > might have seen me posting the ci_summary.html and results_details.tar.xz > files to tickets for pre-commit CI instead of using the ci-cassandra.a.o or > circleci pre-commit liks. Already, we have a full pipeline time down to > two hours and less than a third of the cost of CircleCI, and there's lhf to > further improve this. For serious pre-commit testing we are still missing > and need repeatable test runs, ref CASSANDRA-18942. On all this I'd like > to give a special shout out to Aleksandr Volochnev who was instrumental in > the final (and helm based) work of 18145 which was needed to be able to > test its prerequisite ticket CASSANDRA-18594 – ci-cassandra.a.o would not > be running again today without his recent time spent on it. > > On a separate note, this new jenkinsfile is designed in preparation for > CASSANDRA-18731 ('Add declarative root CI structure'), to make it easier to > define profiles, tests, and their infrastructural requirements. > > > To the community… > We are now in a place where we are looking and requesting further > donations of servers to the ci-cassandra.apache.org jenkins cluster. We > can now also use cloud/instance credits to host auto-scaling k8s-based > ci-cassandra.a.o clones that would be available for community pre-commit > testing. > There's plenty of low-hanging fruit improvements available if folk want > to get involved. Performance and throughput of splits is an important area > as it has a big impact on reducing costs of a whole pipeline run (there's > nothing like knowing you saved another $5 every time you clicked a > button). And if you can just start using the in-tree test scripts more, > that helps a lot. > > > > >