> The scripts that are in cassandra-builds seem like a starting point for 
> converging different CI systems so that they run the same set of tests in as 
> similar environments as possible
Yeah, I took a superset of circle and ASF tests to try and run :allthethings:. 
Part of how the checkstyle dependency check got in the way too, since we 
weren't running that on ASF CI. :)

Strong +1 to more convergence on what we're running in CI for sure.

On Fri, May 24, 2024, at 11:59 AM, Ariel Weisberg wrote:
> Hi,
> 
> There is definitely a mismatch between how the full range of dtests work and 
> the direction CCM is going in and we have some difficulty getting those to 
> match. I fully empathize with several of those CI systems not being publicly 
> visible/accessible, and the behavior of upgrade paths being absolutely 
> inscrutable relative to the environment variables that are set.
> 
> I am happy to volunteer to test things in advance on Apple's CI. I'll also 
> try to get on top of responding faster :-)
> 
> The window where reverting is useful is slightly past now that all the issues 
> I am aware of have been fixed, but in the future I think the burden for 
> revert might need to be lower. It's tough those because putting the burden on 
> ASF for non-ASF CI is not necessarily a given.
> 
> There is a big gap between CI systems where how they invoke the dtests 
> determines the exact set of tests they run and how they invoke CCM (and which 
> CCM bugs they expose). I really don't like this approach including relying on 
> environment variables to dictate dtests execution behavior. I hope to have 
> some time to spend on this once my live migration work is in a better place.
> 
> Right now ASF CI is not running the upgrade paths that trigger JDK version 
> switching which is at the root of our recent problems. Once we close that gap 
> we should be in a much better place in terms of divergence.
> 
> The scripts that are in cassandra-builds seem like a starting point for 
> converging different CI systems so that they run the same set of tests in as 
> similar environments as possible and harness specific quirks are pushed into 
> specific integration points where things like pointing to private mirrors is 
> supported.
> 
> Additionally what I would like to see is that CI harnesses specify the 
> location of all JDKs, and then provide flags (not environment variables) to 
> the dtests that dictate what should be run. What is currently in Java path or 
> Java home shouldn't be relevant for any dtests IMO, I would like the dtests 
> (themselves or delegating to CCM) to juggle that themselves.
> 
> Those flags should also be as declarative as possible and require specifying 
> C* versions and JDK versions so if you want to run the set of tests we 
> required to commit you don't need keep changing how the dtests are invoked. 
> 
> Ariel
> 
> On Thu, May 23, 2024, at 6:22 AM, Mick Semb Wever wrote:
>>> When starting Cassandra nodes, CCM uses the current env Java distribution 
>>> (defined by the JAVA_HOME env variable). This behavior is overridden in 
>>> three cases:
>>> 
>>> - Java version is not supported by the selected Cassandra distribution - in 
>>> which case, CCM looks for supported Java distribution across JAVAx_HOME env 
>>> variables
>>> 
>>> - Java version is specified explicitly (--jvm-version arg or jvm_version 
>>> param if used in Python)
>>> 
>>> - CASSANDRA_USE_JDK11 is defined in env, in which case, for Cassandra 4.x 
>>> CCM forces to use only JDK11
>>> 
>>> 
>>> 
>>> I want to ask you guys whether you are okay with removing the third 
>>> exception. If we remove it, Cassandra 4.x will not be treated in any 
>>> special way—CCM will use the current Java version, so if it is Java 11, it 
>>> will use Java 11 (and automatically set CASSANDRA_USE_JDK11), and if it is 
>>> Java 8, it will use Java 8 (and automatically unset CASSANDRA_USE_JDK11). 
>>> 
>>> 
>>> 
>>> I think there is no need for CCM to use CASSANDRA_USE_JDK11 to make a 
>>> decision about which Java version to use as it adds more complexity, makes 
>>> it work differently for Cassandra 4.x than for other Cassandra versions, 
>>> and actually provides no value at all because if we work with Cassandra 
>>> having our env configured for Java 11, we have to have CASSANDRA_USE_JDK11 
>>> and if not, we cannot have it. Therefore, CCM can be based solely on the 
>>> current Java version and not include the existence of CASSANDRA_USE_JDK11 
>>> in the Java version selection process.
>>> 
>>> 
>>> WDYT? 
>> 
>>  
>> With the recent commits to ccm we have now broken three different CI 
>> systems, in numerous different ways.  All remain broken.
>> 
>> At this point in time, the default behaviour should be to revert those 
>> commits.  Not to discuss whether we can further remove existing 
>> functionality on the assumption we know all consumers, or that they are all 
>> reading this thread and agreeing.
>> 
>> In ccm, the jdk selection and switching does indeed deserve a clean up.  We 
>> have found a number of superfluous ways of achieving the same thing that is 
>> leading to unnecessary code complexity.  But we should not be hard breaking 
>> things for downstream users and our CI.
>> 
>> The initial commit to ccm that broke things was to fix ccm running a binary 
>> 5.0-beta1 with a particular jdk.  This patch and subsequent fixes has 
>> included additional refactoring/cleaning changes that have broken a number 
>> of things, like jdk-switching and upgrade_through_versions tests.  We keep 
>> trying to fix each breakage, but are also including additional adjustments 
>> "to do the right thing" that only ends up breaking yet another thing.   This 
>> shouldn't be how we apply changes to a library that has many (unknown) 
>> consumers, nor that we don't have full test coverage on.
>> 
>> Given the broken CI systems and the troubles we have already caused 
>> consumers, my recommendation is that these commits are reverted, and we live 
>> with the binary 5.0-beta1 breakage for now, while we more patiently work on 
>> a more complete and thorough fix.  Furthermore to the specific question in 
>> the post, I don't believe we should be removing working functionality 
>> without first a deprecation cycle, given that ccm has many unknown 
>> consumers.  This depreciation period can be time-based, since ccm doesn't 
>> have versions.
>> 
>> 
>> 
>> 
>> 
>> 
> 

Reply via email to