A huge amount of work and time went into this and it's going to have a big 
impact on the project. I want to offer a heartfelt thanks to all involved for 
the focus and energy that went into this!

As the author of the system David lovingly dubbed "JoshCI" (/sigh), I 
definitely want to see us all move to converge as much as possible on the CI 
code we're running. While I remain convinced something like CASSANDRA-18731 is 
vital for hygiene in the long run (unit testing our CI, declaratively defining 
atoms of build logic independently from flow), I also think there'd be 
significant value in more of us moving towards using the JenkinsFile where at 
all possible.

Seriously - thanks again for all this work everyone. CI on Cassandra is a Big 
Data Problem, and not an easy one.

On Sun, Apr 28, 2024, at 10:22 AM, Mick Semb Wever wrote:
> 
> Good news.
> 
> CI on 5.0 and trunk is working again, after an unexpected 6 weeks hiatus (and 
> a string of general problems since last year). 
> This includes pre-commit for 5.0 and trunk working again.
> 
> 
> More info…
> 
> From 5.0 we now have in-tree a Jenkinsfile that only relies on the in-tree 
> scripts – it does not depend upon cassandra-builds and all the individual dsl 
> created stage jobs. This aligns how pre-commit and post-commit works.  More 
> importantly, it makes our CI repeatable regardless of the fork/branch of the 
> code, or the jenkins installation.
> 
> For 5.0+ pre-commit use the Cassandra-devbranch-5 and make sure your patch is 
> after sha 3c85def
> The jenkinsfile now comes with pre-defined profiles, it's recommended to use 
> "skinny" until you need the final "pre-commit".  You can also use the custom 
> profile with a regexp when you need just specific test types.
> See https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/build
> 
> For pre-commit on older branches, you now use Cassandra-devbranch-before-5
> 
> For both pre- and post-commit builds, each build now creates two new sharable 
> artefacts: ci_summary.html and results_details.tar.xz
> These are based on what apple contributors were sharing from builds from 
> their internal CI system.  The format and contents of these files is expected 
> to evolve.
> 
> Each build now archives its results and logs all under one location in 
> nightlies.
> 
> e.g. https://nightlies.apache.org/cassandra/Cassandra-5.0/227/ 
> 
> 
> 
> The post-commit pipeline profile remains *very* heavy, at 130k+ tests.  These 
> were previously ramped up to include everything in their pipelines, given 
> everything that's happening in both branches.   So they take time and 
> saturate everything they touch.  We need to re-evaluate what we need to be 
> testing to alleviate this.  There'll also be a new pattern of timeouts and 
> infra/script -related flakies, as happens whenever there's such a significant 
> change, all the patience and help possible is appreciated!
> 
> 
> 
> Now that the jenkinsfile can now be used on any jenkins server for any 
> fork/branch, the next work-in-progress is CASSANDRA-18145, to be able to run 
> the full pipeline with a single command line (given a k8s context 
> (~/.kube/config)).
>   
> We already have most of this working – it's possible to make a clone 
> ci-cassandra.apache.org on k8s using this wip helm chart: 
> https://github.com/thelastpickle/Cassius 
> And we are already using this on an auto-scaling gke k8s cluster – you might 
> have seen me posting the ci_summary.html and results_details.tar.xz files to 
> tickets for pre-commit CI instead of using the ci-cassandra.a.o or circleci 
> pre-commit liks.  Already, we have a full pipeline time down to two hours and 
> less than a third of the cost of CircleCI, and there's lhf to further improve 
> this.  For serious pre-commit testing we are still missing and need 
> repeatable test runs, ref CASSANDRA-18942.  On all this I'd like to give a 
> special shout out to Aleksandr Volochnev who was instrumental in the final 
> (and helm based) work of 18145 which was needed to be able to test its 
> prerequisite ticket CASSANDRA-18594 – ci-cassandra.a.o would not be running 
> again today without his recent time spent on it.
> 
> On a separate note, this new jenkinsfile is designed in preparation for 
> CASSANDRA-18731 ('Add declarative root CI structure'), to make it easier to 
> define profiles, tests, and their infrastructural requirements.
> 
> 
> To the community…
>   We are now in a place where we are looking and requesting further donations 
> of servers to the ci-cassandra.apache.org jenkins cluster.  We can now also 
> use cloud/instance credits to host auto-scaling k8s-based ci-cassandra.a.o 
> clones that would be available for community pre-commit testing.   
>   There's plenty of low-hanging fruit improvements available if folk want to 
> get involved.  Performance and throughput of splits is an important area as 
> it has a big impact on reducing costs of a whole pipeline run  (there's 
> nothing like knowing you saved another $5 every time you clicked a button).  
> And if you can just start using the in-tree test scripts more, that helps a 
> lot.  
> 
> 
> 

Reply via email to