Hi, There is definitely a mismatch between how the full range of dtests work and the direction CCM is going in and we have some difficulty getting those to match. I fully empathize with several of those CI systems not being publicly visible/accessible, and the behavior of upgrade paths being absolutely inscrutable relative to the environment variables that are set.
I am happy to volunteer to test things in advance on Apple's CI. I'll also try to get on top of responding faster :-) The window where reverting is useful is slightly past now that all the issues I am aware of have been fixed, but in the future I think the burden for revert might need to be lower. It's tough those because putting the burden on ASF for non-ASF CI is not necessarily a given. There is a big gap between CI systems where how they invoke the dtests determines the exact set of tests they run and how they invoke CCM (and which CCM bugs they expose). I really don't like this approach including relying on environment variables to dictate dtests execution behavior. I hope to have some time to spend on this once my live migration work is in a better place. Right now ASF CI is not running the upgrade paths that trigger JDK version switching which is at the root of our recent problems. Once we close that gap we should be in a much better place in terms of divergence. The scripts that are in cassandra-builds seem like a starting point for converging different CI systems so that they run the same set of tests in as similar environments as possible and harness specific quirks are pushed into specific integration points where things like pointing to private mirrors is supported. Additionally what I would like to see is that CI harnesses specify the location of all JDKs, and then provide flags (not environment variables) to the dtests that dictate what should be run. What is currently in Java path or Java home shouldn't be relevant for any dtests IMO, I would like the dtests (themselves or delegating to CCM) to juggle that themselves. Those flags should also be as declarative as possible and require specifying C* versions and JDK versions so if you want to run the set of tests we required to commit you don't need keep changing how the dtests are invoked. Ariel On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote: >> When starting Cassandra nodes, CCM uses the current env Java distribution >> (defined by the JAVA_HOME env variable). This behavior is overridden in >> three cases: >> >> - Java version is not supported by the selected Cassandra distribution - in >> which case, CCM looks for supported Java distribution across JAVAx_HOME env >> variables >> >> - Java version is specified explicitly (--jvm-version arg or jvm_version >> param if used in Python) >> >> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x >> CCM forces to use only JDK11 >> >> >> >> I want to ask you guys whether you are okay with removing the third >> exception. If we remove it, Cassandra 4.x will not be treated in any special >> way—CCM will use the current Java version, so if it is Java 11, it will use >> Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is Java 8, it >> will use Java 8 (and automatically unset CASSANDRA_USE_JDK11). >> >> >> >> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a >> decision about which Java version to use as it adds more complexity, makes >> it work differently for Cassandra 4.x than for other Cassandra versions, and >> actually provides no value at all because if we work with Cassandra having >> our env configured for Java 11, we have to have CASSANDRA_USE_JDK11 and if >> not, we cannot have it. Therefore, CCM can be based solely on the current >> Java version and not include the existence of CASSANDRA_USE_JDK11 in the >> Java version selection process. >> >> >> WDYT? > > > With the recent commits to ccm we have now broken three different CI systems, > in numerous different ways. All remain broken. > > At this point in time, the default behaviour should be to revert those > commits. Not to discuss whether we can further remove existing functionality > on the assumption we know all consumers, or that they are all reading this > thread and agreeing. > > In ccm, the jdk selection and switching does indeed deserve a clean up. We > have found a number of superfluous ways of achieving the same thing that is > leading to unnecessary code complexity. But we should not be hard breaking > things for downstream users and our CI. > > The initial commit to ccm that broke things was to fix ccm running a binary > 5.0-beta1 with a particular jdk. This patch and subsequent fixes has > included additional refactoring/cleaning changes that have broken a number of > things, like jdk-switching and upgrade_through_versions tests. We keep > trying to fix each breakage, but are also including additional adjustments > "to do the right thing" that only ends up breaking yet another thing. This > shouldn't be how we apply changes to a library that has many (unknown) > consumers, nor that we don't have full test coverage on. > > Given the broken CI systems and the troubles we have already caused > consumers, my recommendation is that these commits are reverted, and we live > with the binary 5.0-beta1 breakage for now, while we more patiently work on a > more complete and thorough fix. Furthermore to the specific question in the > post, I don't believe we should be removing working functionality without > first a deprecation cycle, given that ccm has many unknown consumers. This > depreciation period can be time-based, since ccm doesn't have versions. > > > > > >