BTW. I've created a simple workflow for GitHub Actions - https://github.com/riptano/ccm/pull/771 - feel free to review
- - -- --- ----- -------- ------------- Jacek Lewandowski sob., 25 maj 2024 o 06:09 Jacek Lewandowski <lewandowski.ja...@gmail.com> napisał(a): > Thank you for all the opinions. That is useful for future work on CCM. > > When I implemented the changes that caused recent headaches, I wasn't > aware that the CCM code was so patchworked, which resulted in many > surprises. I apologize for that. Anyway, there is no reason to sit and > complain instead of fixing what I already found. > > There are some unit tests attached to CCM, and I wonder whether they are > ever run before merging a patch because I'm unaware of any CI. Along with > my recent patches, I've implemented quite comprehensive tests verifying > whether the env updating function works as "expected", but it turned out > the problems were outside that function. I'm making fun of the word > "expected" because precisely defined contracts for CCM do not seem to > exist. I'd like us to invest in unit tests and make them the contract > guards. It is tough to verify whether a patch would cause problems in some > environments; we usually have to run all workloads to confirm that, and > this would not work in the future if we expect the community to get more > involved. This recent incident was an excellent example of such a problem - > I'm really thankful to Mick and Ariel for running all the Apache and Apple > workloads to verify the patch, which consumed much of their time, and I > cannot expect such involvement in the future. > > To the point, let's pay attention to the unit tests, make sure they are > mocking Cassandra instead of running real programs so that they are fast, > and add some CI—if they are fast unit tests, I bet we can rely on some free > tier of GitHub actions. Adding the expectations we assumed when > implementing the CI systems for Cassandra will make it less likely to break > something accidentally. > > In the end, I feel we strayed off the topic a bit - my question was quite > concrete - I'd like to remove the CASSANDRA_USE_JDK11 knob for CCM - it > should set it appropriately for Cassandra 4 so that the CCM user does not > have to bother. However, CCM should not use it to decide which Java version > to use. I'm unaware of any release cycle of CCM versions anywhere. Perhaps > we should do the following - tag a new version before the change and then > add the proposed change. > > There are also other problems related to env setup - for example, when a > user or the dtest framework wants to force a certain Java version, it is > honored only for running a Cassandra node - it is not applied for running > nodetool or any other command line tool. Therefore, there is a broader > question about when the explicit Java version should be set - it feels like > the correct approach would be to set it up when a node is created rather > than when it is started so that the selection applies to running the server > and all the commands. This would simplify things significantly - instead of > resolving env and checking Java distributions each time we are about to run > a node or a tool - resolve the required env changes once we create a node > or when we update the installation directory, which we do when testing > upgrades. Such simplification would also remove some clutter from the logs. > Can you remember the whole environment logged frequently and twice when > running a node? > > Can we make this discussion conclusive? > > Thanks! > > - - -- --- ----- -------- ------------- > Jacek Lewandowski > > > pt., 24 maj 2024 o 20:55 Josh McKenzie <jmcken...@apache.org> napisał(a): > >> The scripts that are in cassandra-builds seem like a starting point for >> converging different CI systems so that they run the same set of tests in >> as similar environments as possible >> >> Yeah, I took a superset of circle and ASF tests to try and run >> :allthethings:. Part of how the checkstyle dependency check got in the way >> too, since we weren't running that on ASF CI. :) >> >> Strong +1 to more convergence on what we're running in CI for sure. >> >> On Fri, May 24, 2024, at 11:59 AM, Ariel Weisberg wrote: >> >> Hi, >> >> There is definitely a mismatch between how the full range of dtests work >> and the direction CCM is going in and we have some difficulty getting those >> to match. I fully empathize with several of those CI systems not being >> publicly visible/accessible, and the behavior of upgrade paths being >> absolutely inscrutable relative to the environment variables that are set. >> >> I am happy to volunteer to test things in advance on Apple's CI. I'll >> also try to get on top of responding faster :-) >> >> The window where reverting is useful is slightly past now that all the >> issues I am aware of have been fixed, but in the future I think the burden >> for revert might need to be lower. It's tough those because putting the >> burden on ASF for non-ASF CI is not necessarily a given. >> >> There is a big gap between CI systems where how they invoke the dtests >> determines the exact set of tests they run and how they invoke CCM (and >> which CCM bugs they expose). I really don't like this approach including >> relying on environment variables to dictate dtests execution behavior. I >> hope to have some time to spend on this once my live migration work is in a >> better place. >> >> Right now ASF CI is not running the upgrade paths that trigger JDK >> version switching which is at the root of our recent problems. Once we >> close that gap we should be in a much better place in terms of divergence. >> >> The scripts that are in cassandra-builds seem like a starting point for >> converging different CI systems so that they run the same set of tests in >> as similar environments as possible and harness specific quirks are pushed >> into specific integration points where things like pointing to private >> mirrors is supported. >> >> Additionally what I would like to see is that CI harnesses specify the >> location of all JDKs, and then provide flags (not environment variables) to >> the dtests that dictate what should be run. What is currently in Java path >> or Java home shouldn't be relevant for any dtests IMO, I would like the >> dtests (themselves or delegating to CCM) to juggle that themselves. >> >> Those flags should also be as declarative as possible and require >> specifying C* versions and JDK versions so if you want to run the set of >> tests we required to commit you don't need keep changing how the dtests are >> invoked. >> >> Ariel >> >> On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote: >> >> When starting Cassandra nodes, CCM uses the current env Java distribution >> (defined by the JAVA_HOME env variable). This behavior is overridden in >> three cases: >> >> - Java version is not supported by the selected Cassandra distribution - >> in which case, CCM looks for supported Java distribution across JAVAx_HOME >> env variables >> >> - Java version is specified explicitly (--jvm-version arg or jvm_version >> param if used in Python) >> >> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x >> CCM forces to use only JDK11 >> >> >> I want to ask you guys whether you are okay with removing the third >> exception. If we remove it, Cassandra 4.x will not be treated in any >> special way—CCM will use the current Java version, so if it is Java 11, it >> will use Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is >> Java 8, it will use Java 8 (and automatically unset CASSANDRA_USE_JDK11). >> >> >> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a >> decision about which Java version to use as it adds more complexity, makes >> it work differently for Cassandra 4.x than for other Cassandra versions, >> and actually provides no value at all because if we work with Cassandra >> having our env configured for Java 11, we have to have CASSANDRA_USE_JDK11 >> and if not, we cannot have it. Therefore, CCM can be based solely on the >> current Java version and not include the existence of CASSANDRA_USE_JDK11 >> in the Java version selection process. >> >> WDYT? >> >> >> >> With the recent commits to ccm we have now broken three different CI >> systems, in numerous different ways. All remain broken. >> >> At this point in time, the default behaviour should be to revert those >> commits. Not to discuss whether we can further remove existing >> functionality on the assumption we know all consumers, or that they are all >> reading this thread and agreeing. >> >> In ccm, the jdk selection and switching does indeed deserve a clean up. >> We have found a number of superfluous ways of achieving the same thing that >> is leading to unnecessary code complexity. But we should not be hard >> breaking things for downstream users and our CI. >> >> The initial commit to ccm that broke things was to fix ccm running a >> binary 5.0-beta1 with a particular jdk. This patch and subsequent fixes >> has included additional refactoring/cleaning changes that have broken a >> number of things, like jdk-switching and upgrade_through_versions tests. >> We keep trying to fix each breakage, but are also including additional >> adjustments "to do the right thing" that only ends up breaking yet another >> thing. This shouldn't be how we apply changes to a library that has many >> (unknown) consumers, nor that we don't have full test coverage on. >> >> Given the broken CI systems and the troubles we have already caused >> consumers, my recommendation is that these commits are reverted, and we >> live with the binary 5.0-beta1 breakage for now, while we more patiently >> work on a more complete and thorough fix. Furthermore to the specific >> question in the post, I don't believe we should be removing working >> functionality without first a deprecation cycle, given that ccm has many >> unknown consumers. This depreciation period can be time-based, since ccm >> doesn't have versions. >> >> >> >> >> >> >> >> >>