> The scripts that are in cassandra-builds seem like a starting point for > converging different CI systems so that they run the same set of tests in as > similar environments as possible Yeah, I took a superset of circle and ASF tests to try and run :allthethings:. Part of how the checkstyle dependency check got in the way too, since we weren't running that on ASF CI. :)
Strong +1 to more convergence on what we're running in CI for sure. On Fri, May 24, 2024, at 11:59 AM, Ariel Weisberg wrote: > Hi, > > There is definitely a mismatch between how the full range of dtests work and > the direction CCM is going in and we have some difficulty getting those to > match. I fully empathize with several of those CI systems not being publicly > visible/accessible, and the behavior of upgrade paths being absolutely > inscrutable relative to the environment variables that are set. > > I am happy to volunteer to test things in advance on Apple's CI. I'll also > try to get on top of responding faster :-) > > The window where reverting is useful is slightly past now that all the issues > I am aware of have been fixed, but in the future I think the burden for > revert might need to be lower. It's tough those because putting the burden on > ASF for non-ASF CI is not necessarily a given. > > There is a big gap between CI systems where how they invoke the dtests > determines the exact set of tests they run and how they invoke CCM (and which > CCM bugs they expose). I really don't like this approach including relying on > environment variables to dictate dtests execution behavior. I hope to have > some time to spend on this once my live migration work is in a better place. > > Right now ASF CI is not running the upgrade paths that trigger JDK version > switching which is at the root of our recent problems. Once we close that gap > we should be in a much better place in terms of divergence. > > The scripts that are in cassandra-builds seem like a starting point for > converging different CI systems so that they run the same set of tests in as > similar environments as possible and harness specific quirks are pushed into > specific integration points where things like pointing to private mirrors is > supported. > > Additionally what I would like to see is that CI harnesses specify the > location of all JDKs, and then provide flags (not environment variables) to > the dtests that dictate what should be run. What is currently in Java path or > Java home shouldn't be relevant for any dtests IMO, I would like the dtests > (themselves or delegating to CCM) to juggle that themselves. > > Those flags should also be as declarative as possible and require specifying > C* versions and JDK versions so if you want to run the set of tests we > required to commit you don't need keep changing how the dtests are invoked. > > Ariel > > On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote: >>> When starting Cassandra nodes, CCM uses the current env Java distribution >>> (defined by the JAVA_HOME env variable). This behavior is overridden in >>> three cases: >>> >>> - Java version is not supported by the selected Cassandra distribution - in >>> which case, CCM looks for supported Java distribution across JAVAx_HOME env >>> variables >>> >>> - Java version is specified explicitly (--jvm-version arg or jvm_version >>> param if used in Python) >>> >>> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x >>> CCM forces to use only JDK11 >>> >>> >>> >>> I want to ask you guys whether you are okay with removing the third >>> exception. If we remove it, Cassandra 4.x will not be treated in any >>> special way—CCM will use the current Java version, so if it is Java 11, it >>> will use Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is >>> Java 8, it will use Java 8 (and automatically unset CASSANDRA_USE_JDK11). >>> >>> >>> >>> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a >>> decision about which Java version to use as it adds more complexity, makes >>> it work differently for Cassandra 4.x than for other Cassandra versions, >>> and actually provides no value at all because if we work with Cassandra >>> having our env configured for Java 11, we have to have CASSANDRA_USE_JDK11 >>> and if not, we cannot have it. Therefore, CCM can be based solely on the >>> current Java version and not include the existence of CASSANDRA_USE_JDK11 >>> in the Java version selection process. >>> >>> >>> WDYT? >> >> >> With the recent commits to ccm we have now broken three different CI >> systems, in numerous different ways. All remain broken. >> >> At this point in time, the default behaviour should be to revert those >> commits. Not to discuss whether we can further remove existing >> functionality on the assumption we know all consumers, or that they are all >> reading this thread and agreeing. >> >> In ccm, the jdk selection and switching does indeed deserve a clean up. We >> have found a number of superfluous ways of achieving the same thing that is >> leading to unnecessary code complexity. But we should not be hard breaking >> things for downstream users and our CI. >> >> The initial commit to ccm that broke things was to fix ccm running a binary >> 5.0-beta1 with a particular jdk. This patch and subsequent fixes has >> included additional refactoring/cleaning changes that have broken a number >> of things, like jdk-switching and upgrade_through_versions tests. We keep >> trying to fix each breakage, but are also including additional adjustments >> "to do the right thing" that only ends up breaking yet another thing. This >> shouldn't be how we apply changes to a library that has many (unknown) >> consumers, nor that we don't have full test coverage on. >> >> Given the broken CI systems and the troubles we have already caused >> consumers, my recommendation is that these commits are reverted, and we live >> with the binary 5.0-beta1 breakage for now, while we more patiently work on >> a more complete and thorough fix. Furthermore to the specific question in >> the post, I don't believe we should be removing working functionality >> without first a deprecation cycle, given that ccm has many unknown >> consumers. This depreciation period can be time-based, since ccm doesn't >> have versions. >> >> >> >> >> >> >