A huge amount of work and time went into this and it's going to have a big impact on the project. I want to offer a heartfelt thanks to all involved for the focus and energy that went into this!
As the author of the system David lovingly dubbed "JoshCI" (/sigh), I definitely want to see us all move to converge as much as possible on the CI code we're running. While I remain convinced something like CASSANDRA-18731 is vital for hygiene in the long run (unit testing our CI, declaratively defining atoms of build logic independently from flow), I also think there'd be significant value in more of us moving towards using the JenkinsFile where at all possible. Seriously - thanks again for all this work everyone. CI on Cassandra is a Big Data Problem, and not an easy one. On Sun, Apr 28, 2024, at 10:22 AM, Mick Semb Wever wrote: > > Good news. > > CI on 5.0 and trunk is working again, after an unexpected 6 weeks hiatus (and > a string of general problems since last year). > This includes pre-commit for 5.0 and trunk working again. > > > More info… > > From 5.0 we now have in-tree a Jenkinsfile that only relies on the in-tree > scripts – it does not depend upon cassandra-builds and all the individual dsl > created stage jobs. This aligns how pre-commit and post-commit works. More > importantly, it makes our CI repeatable regardless of the fork/branch of the > code, or the jenkins installation. > > For 5.0+ pre-commit use the Cassandra-devbranch-5 and make sure your patch is > after sha 3c85def > The jenkinsfile now comes with pre-defined profiles, it's recommended to use > "skinny" until you need the final "pre-commit". You can also use the custom > profile with a regexp when you need just specific test types. > See https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/build > > For pre-commit on older branches, you now use Cassandra-devbranch-before-5 > > For both pre- and post-commit builds, each build now creates two new sharable > artefacts: ci_summary.html and results_details.tar.xz > These are based on what apple contributors were sharing from builds from > their internal CI system. The format and contents of these files is expected > to evolve. > > Each build now archives its results and logs all under one location in > nightlies. > > e.g. https://nightlies.apache.org/cassandra/Cassandra-5.0/227/ > > > > The post-commit pipeline profile remains *very* heavy, at 130k+ tests. These > were previously ramped up to include everything in their pipelines, given > everything that's happening in both branches. So they take time and > saturate everything they touch. We need to re-evaluate what we need to be > testing to alleviate this. There'll also be a new pattern of timeouts and > infra/script -related flakies, as happens whenever there's such a significant > change, all the patience and help possible is appreciated! > > > > Now that the jenkinsfile can now be used on any jenkins server for any > fork/branch, the next work-in-progress is CASSANDRA-18145, to be able to run > the full pipeline with a single command line (given a k8s context > (~/.kube/config)). > > We already have most of this working – it's possible to make a clone > ci-cassandra.apache.org on k8s using this wip helm chart: > https://github.com/thelastpickle/Cassius > And we are already using this on an auto-scaling gke k8s cluster – you might > have seen me posting the ci_summary.html and results_details.tar.xz files to > tickets for pre-commit CI instead of using the ci-cassandra.a.o or circleci > pre-commit liks. Already, we have a full pipeline time down to two hours and > less than a third of the cost of CircleCI, and there's lhf to further improve > this. For serious pre-commit testing we are still missing and need > repeatable test runs, ref CASSANDRA-18942. On all this I'd like to give a > special shout out to Aleksandr Volochnev who was instrumental in the final > (and helm based) work of 18145 which was needed to be able to test its > prerequisite ticket CASSANDRA-18594 – ci-cassandra.a.o would not be running > again today without his recent time spent on it. > > On a separate note, this new jenkinsfile is designed in preparation for > CASSANDRA-18731 ('Add declarative root CI structure'), to make it easier to > define profiles, tests, and their infrastructural requirements. > > > To the community… > We are now in a place where we are looking and requesting further donations > of servers to the ci-cassandra.apache.org jenkins cluster. We can now also > use cloud/instance credits to host auto-scaling k8s-based ci-cassandra.a.o > clones that would be available for community pre-commit testing. > There's plenty of low-hanging fruit improvements available if folk want to > get involved. Performance and throughput of splits is an important area as > it has a big impact on reducing costs of a whole pipeline run (there's > nothing like knowing you saved another $5 every time you clicked a button). > And if you can just start using the in-tree test scripts more, that helps a > lot. > > >