[jira] [Commented] (CASSANDRA-18731) Add declarative root CI structure

Ekaterina Dimitrova (Jira) Wed, 29 Nov 2023 09:32:18 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791208#comment-17791208
 ]


Ekaterina Dimitrova commented on CASSANDRA-18731:
-------------------------------------------------

{quote} - simulator-dtest isn't running jdk17 because that's currently broken, 
ref CASSANDRA-18616{quote}
For the record it is {*}not broken{*}, it *does not support* JDK17, there is a 
difference. 
{quote}I think we should revisit this on the dev ML; I don't fully agree w/what 
you've outlined above and this is something we should discuss as a project 
community, not mandate on a comment in a JIRA.
{quote}
Agreed, let's revise it after the release and the summit. Also, there are 
currently breakages in Jenkins and CircleCI and they take a priority. 
{quote}meaning the python dtest suites and jvm-dtest-upgrade should likely be 
calibrated to "highest supported and down", vs. "lowest supported and up".
{quote}
I am not sure I understand here what you meant [~jmckenzie] , so I would 
reiterate what I had in mind. 

I think we should do at least JDK11 build and JDK17 build, then runtime for 
tests should be JDK11 and JDK17 with the JDK11 build. This is also what we ship 
to users and what they use - JDK11 build and they can use JDK11/17 runtime. Of 
course 11 and 17 will be substituted with next JDK versions when we shift, this 
is just for the sake of the example. 
{quote}Further, I don't agree with (and we've discussed) the need to run 
test-cdc (or it existing at all vs. it just being enabled for all tests), the 
compression, oa, or trie suites as pre-commit smoke suites, etc. I have a 
strong belief that we should expect very infrequent test failures on 
non-default configurations vs. the base; that functionality is supposed to be 
API compatible and if it works on one configuration, it works on all. While 
there will no doubt be flakes, the amount of compute required to run all those 
pre-commit as a smoke suite effectively removes the "smoke" aspect of it and 
just makes pre-commit a vast majority of the cost of a full run.
{quote}
I think it was already agreed not to run cdc, oa, keyspaces, tries pre-commit? 
The only reason we have some of those in the pre-commit runs in CircleCI now is 
that they were not added until recently in Jenkins. So if we do not run them 
pre-commit, we will not run them also post-commit. Now when that has changed I 
will open a ticket not to be mandatory pre-commit in CircleCI. *Disclaimer:* 
pre-commit workflow in CircleCI just points to what should be the minimum 
expectation of job suites to be run pre-commit. Though my understanding is it 
is part of the committers responsibility to judge during the reviews whether 
more tests is good to be run. Example - I can see upgrade tests being run for 
TCM pre-commit as it is obvious that patch can break upgrades. So the 
authors/reviewers ensured those tests are not broken pre-commit. (I believe 
there was some glitch, but it was something small and overall they were cleaned)
{quote}These all come from in-tree scripts, which must be the shared basis for 
any CI system. Moving forward there should not be any excuse that any CI system 
doesn't do the artifacts, deb+rpm packaging, and checks.
{quote}
Agreed, that was the idea behind the repeatable CI project - one source of 
truth. Even before those, when we worked on preparing CircleCI to be source of 
truth for releases due to Jenkins instability - I was personally checking and 
aligning that cassandra-builds and the CircleCI scripts run the same commands. 
{quote}We have a ticket to add it to ASF CI (in-tree scripts), and it was only 
put on hold because this ticket was proposing a better approach.
{quote}
That is also my understanding - [~Bereng] had already a patch a few weeks ago
{quote}Same for testing everything in no-vnode and vnode combination; that's so 
terribly wasteful that I just can't agree with doing that for everything 
pre-commit (in any env, circle, new, or ASF-CI). I would very much prefer we 
take the approach you enumerated on the dev ML:
{quote}
That should be discussed on the ML too
{quote}And for repeated runs, we discussed the fact that not having support for 
repeated runs in ASF CI meant we also didn't want to put down a hard blocker on 
requiring those runs on other pre-commit environments.
{quote}
[~jmckenzie] , where was this agreed? 
{quote}using the above restricted smoke heuristic, and many of them look to be 
circle ci env specific and not asf ci.
{quote}
[~jmckenzie] , I am not sure I understand what you mean? I saw there was a full 
CircleCI run pre-commit for TCM. Mick ran all possible tests and Sam opened 
tickets for them. Also, I was triaging the trunk test failures last night, 
bisecting and checking what is TCM-related and what is not - there was like 1 
or 2 OOMs and I do not see any environment related failures. Maybe you had a 
different run in mind?
{quote}If we're confident that CI on the latest JDK is better, sure. But I'm 
not sold yet, because there are failures when only coding and testing on 
highest JDK beyond the simple compile/classpath type.
{quote}
[~mck] , I am not sure I understand correctly this point and what Josh 
mentioned about JDKs, but please check what I suggested at the beginning of 
this post. I do not advocate for anything being run *only* on the top JDK 
version. 
{quote}AFAIK a number of the breakages TCM created were because tries and 
vnodes were not also run. This is the proof. We need to improve it (more 
selective to what tests are applicable and to be run within each variation, 
etc), but we gotta work with what we know catches failures today.
{quote}
Patches of the calibre of TCM that touch almost everything should be expected 
to be committed after running all possible tests, IMHO. I did the same when I 
was adding the configuration changes which were touching everywhere. 
JDK-migration also ensured all tests are checked pre-commit. I think there is a 
difference between a tiny patch that we are clear there is no way it will 
affect upgrades for example and huge patches like TCM, storage engine etc. 
which are expected to shake everything.
{quote}I am entirely in support of TCM merging when it did – improving every 
chance of a release with TCM and Accord in it before the Summit is a trade-off 
in favour of temporary CI headaches I'll take, and TCM devs have been _very_ 
quick to restore CI. Right trade-off, very temporary pain, pragmatic. The 
Summit is an important opportunity to get TCM+Accord into more hands and user 
testing. 
{quote}
I do not share the sentiment. I already expressed my opinion on the ML during 
the ML and release discussions so I am not going to repeat it here. 

> Add declarative root CI structure
> ---------------------------------
>
>                 Key: CASSANDRA-18731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18731
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CI
>            Reporter: Josh McKenzie
>            Assignee: Josh McKenzie
>            Priority: Normal
>             Fix For: 5.x
>
>
> Currently we have a somewhat declarative structure in .circleci, however 
> there's still quite a bit that's baked in (resource limitations, parallelism, 
> which suites qualify for which pipelines (pre-commit vs. post-commit vs. 
> ???), etc). Further, while CASSANDRA-18133 brings the build scripts in-tree, 
> all these parameters (pipelines, env vars, job definitions, resource 
> constraints) are all still scattered throughout the shell scripts and/or 
> reliant on the {{JenkinsFile}} to determine what suites comprise what 
> pipelines.
> This ticket aims to decouple the definition of pipelines and jobs for CI from 
> the implementations themselves. The goal here is to define, establish, and 
> test both the base config and some helper methods to provide _other_ 
> configurations (circle, ASF CI, etc) the tools they need to programmatically 
> inherit base CI config from a purely declarative structure.
> Follow-up tickets will involve rewriting the in-tree build scripts and 
> JenkinsFile generation to rely on this structure, as well as integrating the 
> config parsing unit tests into our CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18731) Add declarative root CI structure

Reply via email to