Great summary Josh, > > - JDK-based test suites on highest supported jdk using other config > > Do you mean a smoke test suite by that ^ ?
- - -- --- ----- -------- ------------- Jacek Lewandowski czw., 15 lut 2024 o 18:12 Josh McKenzie <jmcken...@apache.org> napisał(a): > Would it make sense to only block commits on the test strategy you've > listed, and shift the entire massive test suite to post-commit? > > > Lots and lots of other emails > > > ;) > > There's an interesting broad question of: What config do we consider > "recommended" going forward, the "conservative" (i.e. old) or the > "performant" (i.e. new)? And what JDK do we consider "recommended" going > forward, the oldest we support or the newest? > > Since those recommendations apply for new clusters, people need to qualify > their setups, and we have a high bar of quality on testing pre-merge, my > gut tells me "performant + newest JDK". This would impact what we'd test > pre-commit IMO. > > Having been doing a lot of CI stuff lately, some observations: > > - Our True North needs to be releasing a database that's free of > defects that violate our core properties we commit to our users. No data > loss, no data resurrection, transient or otherwise, due to defects in our > code (meteors, tsunamis, etc notwithstanding). > - The relationship of time spent on CI and stability of final full > *post-commit* runs is asymptotic. It's not even 90/10; we're probably > somewhere like 98% value gained from 10% of work, and the other 2% > "stability" (i.e. green test suites, not "our database works") is a > long-tail slog. Especially in the current ASF CI heterogenous env w/its > current orchestration. > - Thus: Pre-commit and post-commit should be different. The following > points all apply to pre-commit: > - The goal of pre-commit tests should be some number of 9's of no test > failures post-commit (i.e. for every 20 green pre-commit we introduce 1 > flake post-commit). Not full perfection; it's not worth the compute and > complexity. > - We should *build *all branches on all supported JDK's (8 + 11 for > older, 11 + 17 for newer, etc). > - We should *run *all test suites with the *recommended * > *configuration* against the *highest versioned JDK a branch supports. *And > we should formally recommend our users run on that JDK. > - We should *at least* run all jvm-based configurations on the highest > supported JDK version with the "not recommended but still supported" > configuration. > - I'm open to being persuaded that we should at least run jvm-unit > tests on the older JDK w/the conservative config pre-commit, but not much > beyond that. > > That would leave us with the following distilled: > > *Pre-commit:* > > - Build on all supported jdks > - All test suites on highest supported jdk using recommended config > - Repeat testing on new or changed tests on highest supported JDK > w/recommended config > - JDK-based test suites on highest supported jdk using other config > > *Post-commit:* > > - Run everything. All suites, all supported JDK's, both config files. > > With Butler + the *jenkins-jira* integration script > <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>(need > to dust that off but it should remain good to go), we should have a pretty > clear view as to when any consistent regressions are introduced and why. > We'd remain exposed to JDK-specific flake introductions and flakes in > unchanged tests, but there's no getting around the 2nd one and I expect the > former to be rare enough to not warrant the compute to prevent it. > > On Thu, Feb 15, 2024, at 10:02 AM, Jon Haddad wrote: > > Would it make sense to only block commits on the test strategy you've > listed, and shift the entire massive test suite to post-commit? If there > really is only a small % of times the entire suite is useful this seems > like it could unblock the dev cycle but still have the benefit of the full > test suite. > > > > On Thu, Feb 15, 2024 at 3:18 AM Berenguer Blasi <berenguerbl...@gmail.com> > wrote: > > > On reducing circle ci usage during dev while iterating, not with the > intention to replace the pre-commit CI (yet), we could do away with testing > only dtests, jvm-dtests, units and cqlsh for a _single_ configuration imo. > That would greatly reduce usage. I hacked it quickly here for illustration > purposes: > https://app.circleci.com/pipelines/github/bereng/cassandra/1164/workflows/3a47c9ef-6456-4190-b5a5-aea2aff641f1 > The good thing is that we have the tooling to dial in whatever we decide > atm. > > Changing pre-commit is a different discussion, to which I agree btw. But > the above could save time and $ big time during dev and be done and merged > in a matter of days imo. > > I can open a DISCUSS thread if we feel it's worth it. > On 15/2/24 10:24, Mick Semb Wever wrote: > > > > Mick and Ekaterina (and everyone really) - any thoughts on what test > coverage, if any, we should commit to for this new configuration? > Acknowledging that we already have *a lot* of CI that we run. > > > > > Branimir in this patch has already done some basic cleanup of test > variations, so this is not a duplication of the pipeline. It's a > significant improvement. > > I'm ok with cassandra_latest being committed and added to the pipeline, > *if* the authors genuinely believe there's significant time and effort > saved in doing so. > > How many broken tests are we talking about ? > Are they consistently broken or flaky ? > Are they ticketed up and 5.0-rc blockers ? > > Having to deal with flakies and broken tests is an unfortunate reality to > having a pipeline of 170k tests. > > Despite real frustrations I don't believe the broken windows analogy is > appropriate here – it's more of a leave the campground cleaner… That > being said, knowingly introducing a few broken tests is not that either, > but still having to deal with a handful of consistently breaking tests > for a short period of time is not the same cognitive burden as flakies. > There are currently other broken tests in 5.0: VectorUpdateDeleteTest, > upgrade_through_versions_test; are these compounding to the frustrations ? > > It's also been questioned about why we don't just enable settings we > recommend. These are settings we recommend for new clusters. Our existing > cassandra.yaml needs to be tailored for existing clusters being upgraded, > where we are very conservative about changing defaults. > > >