BTW. I've created a simple workflow for GitHub Actions -
https://github.com/riptano/ccm/pull/771 - feel free to review

- - -- --- ----- -------- -------------
Jacek Lewandowski


sob., 25 maj 2024 o 06:09 Jacek Lewandowski <lewandowski.ja...@gmail.com>
napisał(a):

> Thank you for all the opinions. That is useful for future work on CCM.
>
> When I implemented the changes that caused recent headaches, I wasn't
> aware that the CCM code was so patchworked, which resulted in many
> surprises. I apologize for that. Anyway, there is no reason to sit and
> complain instead of fixing what I already found.
>
> There are some unit tests attached to CCM, and I wonder whether they are
> ever run before merging a patch because I'm unaware of any CI. Along with
> my recent patches, I've implemented quite comprehensive tests verifying
> whether the env updating function works as "expected", but it turned out
> the problems were outside that function. I'm making fun of the word
> "expected" because precisely defined contracts for CCM do not seem to
> exist. I'd like us to invest in unit tests and make them the contract
> guards. It is tough to verify whether a patch would cause problems in some
> environments; we usually have to run all workloads to confirm that, and
> this would not work in the future if we expect the community to get more
> involved. This recent incident was an excellent example of such a problem -
> I'm really thankful to Mick and Ariel for running all the Apache and Apple
> workloads to verify the patch, which consumed much of their time, and I
> cannot expect such involvement in the future.
>
> To the point, let's pay attention to the unit tests, make sure they are
> mocking Cassandra instead of running real programs so that they are fast,
> and add some CI—if they are fast unit tests, I bet we can rely on some free
> tier of GitHub actions. Adding the expectations we assumed when
> implementing the CI systems for Cassandra will make it less likely to break
> something accidentally.
>
> In the end, I feel we strayed off the topic a bit - my question was quite
> concrete - I'd like to remove the CASSANDRA_USE_JDK11 knob for CCM - it
> should set it appropriately for Cassandra 4 so that the CCM user does not
> have to bother. However, CCM should not use it to decide which Java version
> to use. I'm unaware of any release cycle of CCM versions anywhere. Perhaps
> we should do the following - tag a new version before the change and then
> add the proposed change.
>
> There are also other problems related to env setup - for example, when a
> user or the dtest framework wants to force a certain Java version, it is
> honored only for running a Cassandra node - it is not applied for running
> nodetool or any other command line tool. Therefore, there is a broader
> question about when the explicit Java version should be set - it feels like
> the correct approach would be to set it up when a node is created rather
> than when it is started so that the selection applies to running the server
> and all the commands. This would simplify things significantly - instead of
> resolving env and checking Java distributions each time we are about to run
> a node or a tool - resolve the required env changes once we create a node
> or when we update the installation directory, which we do when testing
> upgrades. Such simplification would also remove some clutter from the logs.
> Can you remember the whole environment logged frequently and twice when
> running a node?
>
> Can we make this discussion conclusive?
>
> Thanks!
>
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>
>
> pt., 24 maj 2024 o 20:55 Josh McKenzie <jmcken...@apache.org> napisał(a):
>
>> The scripts that are in cassandra-builds seem like a starting point for
>> converging different CI systems so that they run the same set of tests in
>> as similar environments as possible
>>
>> Yeah, I took a superset of circle and ASF tests to try and run
>> :allthethings:. Part of how the checkstyle dependency check got in the way
>> too, since we weren't running that on ASF CI. :)
>>
>> Strong +1 to more convergence on what we're running in CI for sure.
>>
>> On Fri, May 24, 2024, at 11:59 AM, Ariel Weisberg wrote:
>>
>> Hi,
>>
>> There is definitely a mismatch between how the full range of dtests work
>> and the direction CCM is going in and we have some difficulty getting those
>> to match. I fully empathize with several of those CI systems not being
>> publicly visible/accessible, and the behavior of upgrade paths being
>> absolutely inscrutable relative to the environment variables that are set.
>>
>> I am happy to volunteer to test things in advance on Apple's CI. I'll
>> also try to get on top of responding faster :-)
>>
>> The window where reverting is useful is slightly past now that all the
>> issues I am aware of have been fixed, but in the future I think the burden
>> for revert might need to be lower. It's tough those because putting the
>> burden on ASF for non-ASF CI is not necessarily a given.
>>
>> There is a big gap between CI systems where how they invoke the dtests
>> determines the exact set of tests they run and how they invoke CCM (and
>> which CCM bugs they expose). I really don't like this approach including
>> relying on environment variables to dictate dtests execution behavior. I
>> hope to have some time to spend on this once my live migration work is in a
>> better place.
>>
>> Right now ASF CI is not running the upgrade paths that trigger JDK
>> version switching which is at the root of our recent problems. Once we
>> close that gap we should be in a much better place in terms of divergence.
>>
>> The scripts that are in cassandra-builds seem like a starting point for
>> converging different CI systems so that they run the same set of tests in
>> as similar environments as possible and harness specific quirks are pushed
>> into specific integration points where things like pointing to private
>> mirrors is supported.
>>
>> Additionally what I would like to see is that CI harnesses specify the
>> location of all JDKs, and then provide flags (not environment variables) to
>> the dtests that dictate what should be run. What is currently in Java path
>> or Java home shouldn't be relevant for any dtests IMO, I would like the
>> dtests (themselves or delegating to CCM) to juggle that themselves.
>>
>> Those flags should also be as declarative as possible and require
>> specifying C* versions and JDK versions so if you want to run the set of
>> tests we required to commit you don't need keep changing how the dtests are
>> invoked.
>>
>> Ariel
>>
>> On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote:
>>
>> When starting Cassandra nodes, CCM uses the current env Java distribution
>> (defined by the JAVA_HOME env variable). This behavior is overridden in
>> three cases:
>>
>> - Java version is not supported by the selected Cassandra distribution -
>> in which case, CCM looks for supported Java distribution across JAVAx_HOME
>> env variables
>>
>> - Java version is specified explicitly (--jvm-version arg or jvm_version
>> param if used in Python)
>>
>> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x
>> CCM forces to use only JDK11
>>
>>
>> I want to ask you guys whether you are okay with removing the third
>> exception. If we remove it, Cassandra 4.x will not be treated in any
>> special way—CCM will use the current Java version, so if it is Java 11, it
>> will use Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is
>> Java 8, it will use Java 8 (and automatically unset CASSANDRA_USE_JDK11).
>>
>>
>> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a
>> decision about which Java version to use as it adds more complexity, makes
>> it work differently for Cassandra 4.x than for other Cassandra versions,
>> and actually provides no value at all because if we work with Cassandra
>> having our env configured for Java 11, we have to have CASSANDRA_USE_JDK11
>> and if not, we cannot have it. Therefore, CCM can be based solely on the
>> current Java version and not include the existence of CASSANDRA_USE_JDK11
>> in the Java version selection process.
>>
>> WDYT?
>>
>>
>>
>> With the recent commits to ccm we have now broken three different CI
>> systems, in numerous different ways.  All remain broken.
>>
>> At this point in time, the default behaviour should be to revert those
>> commits.  Not to discuss whether we can further remove existing
>> functionality on the assumption we know all consumers, or that they are all
>> reading this thread and agreeing.
>>
>> In ccm, the jdk selection and switching does indeed deserve a clean up.
>> We have found a number of superfluous ways of achieving the same thing that
>> is leading to unnecessary code complexity.  But we should not be hard
>> breaking things for downstream users and our CI.
>>
>> The initial commit to ccm that broke things was to fix ccm running a
>> binary 5.0-beta1 with a particular jdk.  This patch and subsequent fixes
>> has included additional refactoring/cleaning changes that have broken a
>> number of things, like jdk-switching and upgrade_through_versions tests.
>> We keep trying to fix each breakage, but are also including additional
>> adjustments "to do the right thing" that only ends up breaking yet another
>> thing.   This shouldn't be how we apply changes to a library that has many
>> (unknown) consumers, nor that we don't have full test coverage on.
>>
>> Given the broken CI systems and the troubles we have already caused
>> consumers, my recommendation is that these commits are reverted, and we
>> live with the binary 5.0-beta1 breakage for now, while we more patiently
>> work on a more complete and thorough fix.  Furthermore to the specific
>> question in the post, I don't believe we should be removing working
>> functionality without first a deprecation cycle, given that ccm has many
>> unknown consumers.  This depreciation period can be time-based, since ccm
>> doesn't have versions.
>>
>>
>>
>>
>>
>>
>>
>>
>>

Reply via email to