Good news.

CI on 5.0 and trunk is working again, after an unexpected 6 weeks
hiatus (and a string of general problems since last year).
This includes pre-commit for 5.0 and trunk working again.


More info…

>From 5.0 we now have in-tree a Jenkinsfile that only relies on the in-tree
scripts – it does not depend upon cassandra-builds and all the individual
dsl created stage jobs. This aligns how pre-commit and post-commit works.
More importantly, it makes our CI repeatable regardless of the fork/branch
of the code, or the jenkins installation.

For 5.0+ pre-commit use the Cassandra-devbranch-5 and make sure your patch
is after sha 3c85def
The jenkinsfile now comes with pre-defined profiles, it's recommended to
use "skinny" until you need the final "pre-commit".  You can also use the
custom profile with a regexp when you need just specific test types.
See https://ci-cassandra.apache.org/job/Cassandra-devbranch-5/build

For pre-commit on older branches, you now use Cassandra-devbranch-before-5

For both pre- and post-commit builds, each build now creates two new
sharable artefacts: ci_summary.html and results_details.tar.xz
These are based on what apple contributors were sharing from builds from
their internal CI system.  The format and contents of these files is
expected to evolve.

Each build now archives its results and logs all under one location in
nightlies.
e.g. https://nightlies.apache.org/cassandra/Cassandra-5.0/227/



The post-commit pipeline profile remains *very* heavy, at 130k+ tests.
These were previously ramped up to include everything in their pipelines,
given everything that's happening in both branches.   So they take time and
saturate everything they touch.  We need to re-evaluate what we need to be
testing to alleviate this.  There'll also be a new pattern of timeouts and
infra/script -related flakies, as happens whenever there's such a
significant change, all the patience and help possible is appreciated!



Now that the jenkinsfile can now be used on any jenkins server for any
fork/branch, the next work-in-progress is CASSANDRA-18145, to be able to
run the full pipeline with a single command line (given a k8s context
(~/.kube/config)).

We already have most of this working – it's possible to make a clone
ci-cassandra.apache.org on k8s using this wip helm chart:
https://github.com/thelastpickle/Cassius
And we are already using this on an auto-scaling gke k8s cluster – you
might have seen me posting the ci_summary.html and results_details.tar.xz
files to tickets for pre-commit CI instead of using the ci-cassandra.a.o or
circleci pre-commit liks.  Already, we have a full pipeline time down to
two hours and less than a third of the cost of CircleCI, and there's lhf to
further improve this.  For serious pre-commit testing we are still missing
and need repeatable test runs, ref CASSANDRA-18942.  On all this I'd like
to give a special shout out to Aleksandr Volochnev who was instrumental in
the final (and helm based) work of 18145 which was needed to be able to
test its prerequisite ticket CASSANDRA-18594 – ci-cassandra.a.o would not
be running again today without his recent time spent on it.

On a separate note, this new jenkinsfile is designed in preparation for
CASSANDRA-18731 ('Add declarative root CI structure'), to make it easier to
define profiles, tests, and their infrastructural requirements.


To the community…
  We are now in a place where we are looking and requesting further
donations of servers to the ci-cassandra.apache.org jenkins cluster.  We
can now also use cloud/instance credits to host auto-scaling k8s-based
ci-cassandra.a.o clones that would be available for community pre-commit
testing.
  There's plenty of low-hanging fruit improvements available if folk want
to get involved.  Performance and throughput of splits is an important area
as it has a big impact on reducing costs of a whole pipeline run  (there's
nothing like knowing you saved another $5 every time you clicked a
button).  And if you can just start using the in-tree test scripts more,
that helps a lot.

Reply via email to