Re: [DISCUSS] Forking Cassandra utilities into a separately released library

Josh McKenzie Mon, 08 Jun 2026 07:26:05 -0700

> One other motivation for forking is that we can fix issues one time rather 
> than have to fix in 5 branches that have slightly different versions of our 
> libraries. 
The pain on this one is real. Spit-balling, but I wonder if there'd be a way to 
sustainably have all GA branches depend on this code from trunk and we use 
testing and validation to ensure the code on trunk stays compatible with older 
releases.


There's a lot of complexity there since we'd need CI updated to run that subset 
of tooling tests across all GA branches before a commit (i.e. trunk only 
changes would then potentially impact all GA branches), but maybe that actually 
wouldn't be so bad if we just had a new pipeline that pulled and built all GA 
branches from HEAD and ran through the tooling test suites against those 
releases. That, and it'd only really be in scope if you were making changes to 
that tooling. That said, it would seem pretty weird for 5.0 to need to check 
out code from the trunk branch to build and run tests against though... =/

> My primary need is for test utilities so my focus is there.
Hm. Yeah, the more I think through this, having a versioned set of test 
utilities in trunk for instance would definitely feel like "crossing the 
streams" (i.e. PropertyTestingBase4.0, PropertyTestingBase4.1, etc). Big 
separation of concerns / scope failure if people working on a trunk branch in 
C* are having to think about other branches and API breakage with them (moreso 
than we already have to w/mixed version upgrades etc.)

Having things like that in a separate repo where we could cut iterate on things 
to update for a single branch would alleviate that immediate versioning / 
mismatch context leak, but that introduces the inverse problem where you'd have 
to make a change across N branches on the shared library if you have a patch 
that introduces testing that hits all our GA C* and need to backport that 
functionality instead of changing it in one place.

Blech.

So as I was drafting the above, my thinking has distilled down to the following 
as being important to have a shared mental model on:
 • Do we expect the shared functionality in this lib would change frequently in 
ways that would impact multiple branches, or do we think it would be mostly 
stable for older branches and mutate more frequently on trunk?
   • If the former (multi-branch impacting blast radius, we keep older GA 
branches in sync / compatible with test harness changes), a single golden copy 
of the shared code that each branch shares would minimize toil
   • If the latter (mostly stable, trunk only changes) then having a branch of 
tools per GA branch would be optimal

>From a workflow perspective, a shared library factored out to its own repo and 
>embedded into C* branches as a submodule has some attractive properties either 
>way. It gives you "best of both worlds" (or least-worst-option) by allowing 
>you to work on things seamlessly as though they were one project but keep the 
>branching strategies of the tooling and the dependents decoupled. Even if we 
>only had 1 branch of the test tooling that all C* versions depended on, having 
>it separate and embedded as a submodule should give us the same devx 
>ergonomics while preserving the option to customize per C* branch fairly 
>easily.

On Fri, Jun 5, 2026, at 9:25 AM, David Capwell wrote:
> One other motivation for forking is that we can fix issues one time rather 
> than have to fix in 5 branches that have slightly different versions of our 
> libraries. A recent example is CASSANDRA-21216 which was a bug fix for btree. 
>  
> 
> One of the other reasons brought up in the past is that many libraries are 
> needed by accord but accord can’t depend on Cassandra else we have a cyclical 
> dependency, so forking off let’s accord use our libraries.  For the time 
> being accord had to fork many libraries in accord to make progress; this is a 
> common issue right now.
> 
> 
> 
> Sent from my iPhone
> 
>> On Jun 3, 2026, at 1:45 PM, Josh McKenzie <[email protected]> wrote:
>> 
>>> delays this effort for years as we need time to get people on board and 
>>> used to gradle before we flip that switch. 
>> Oof. I'm way more optimistic on this one; if we can get a PR that has ant 
>> targets as dumb wrappers that instead call gradle targets (i.e. all 
>> workflows and local scripting Just Work), I don't see why we couldn't merge 
>> that as soon as we ironed out kinks.
>> 
>> Is there anyone that's broadly against that approach? Or did I just 
>> misunderstand the other thread / JIRA you'd created David?
>> 
>> On Wed, Jun 3, 2026, at 1:21 PM, David Capwell wrote:
>>> Fair point but one thing to point out, if this work depends on gradle that 
>>> delays this effort for years as we need time to get people on board and 
>>> used to gradle before we flip that switch.  So leaving in tree means we 
>>> have to hand roll all that logic in ant. 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jun 3, 2026, at 12:33 PM, Jon Haddad <[email protected]> wrote:
>>>> 
>>>> Josh is right.  Gradle subprojects could allow this without dealing with 
>>>> separate repo.  I've done this before and am about to again for some stuff 
>>>> I maintain.  I spent a long time agonozing over this for my other projects 
>>>> and found it works exceptionally well, especially bc you frequently 
>>>> develop things that are tightly coupled.  
>>>> 
>>>> Juggling repos sucks, this solves it (imo) perfectly.
>>>> 
>>>> Jon
>>>> 
>>>> On Tue, Jun 2, 2026 at 1:18 PM Josh McKenzie <[email protected]> wrote:
>>>>> __
>>>>>> Is there a reason not to use a folder in the current repo that becomes 
>>>>>> its own jar?  It can even be published separately if we like?
>>>>> 
>>>>>> Mostly to decouple from Cassandra release.
>>>>> I *think* we could just have that .jar release on its own cadence 
>>>>> independently of the parent C* project.
>>>>> 
>>>>> Some of us have talked about taking this same approach to making some 
>>>>> code from C* available to the ecosystem (think I/O .jar that has SSTable 
>>>>> read/write, CommitLog read/write, etc). This feels like a very similarly 
>>>>> shaped thing.
>>>>> 
>>>>> I assume w/a modern build / publish / etc system we'd be able to publish 
>>>>> a release that represents a strict subset of the parent project out of 
>>>>> the repo right?
>>>>> 
>>>>> On Mon, Jun 1, 2026, at 8:18 PM, David Capwell wrote:
>>>>>> Mostly to decouple from Cassandra release.  If there is a feature added 
>>>>>> does it have to wait for the next major release of Cassandra so others 
>>>>>> can consume?  Even if we can get to yearly releases that’s still a long 
>>>>>> wait.
>>>>>> 
>>>>>> For example Alex and I have been talking about proper fuzz testing, so 
>>>>>> best case is a year before 3rd parties could use.
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Jun 1, 2026, at 4:32 PM, Jeremiah Jordan <[email protected]> wrote:
>>>>>>> 
>>>>>>> Does it need to be a separate repo? Is there a reason not to use a 
>>>>>>> folder in the current repo that becomes its own jar?  It can even be 
>>>>>>> published separately if we like?
>>>>>>> 
>>>>>>> -Jeremiah
>>>>>>> 
>>>>>>> On Jun 1, 2026 at 10:00:15 AM, David Capwell <[email protected]> wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> We've discussed pulling utilities out of trunk before. I'd like to 
>>>>>>>> actually start.  My primary need is for test utilities so my focus is 
>>>>>>>> there.
>>>>>>>> 
>>>>>>>> This isn't just my need. Sidecar wants property/stateful tests but 
>>>>>>>> can't use ours without a published jar.
>>>>>>>> 
>>>>>>>> Proposed approach:
>>>>>>>> 
>>>>>>>> 1. Define scope — start with property/stateful test utilities
>>>>>>>> 2. Set up the repo and release independently of Cassandra
>>>>>>>> 3. ...
>>>>>>>> 4. Cassandra depends on the library
>>>>>>>> 
>>>>>>>> I'd focus on the fork first, before making Cassandra depend on it — 
>>>>>>>> keeps our builds simple and gives the lib room to stabilize. We can 
>>>>>>>> sort out the dependency question later (wait on releases, or use 
>>>>>>>> submodules?).
>>>>>>>> 
>>>>>>>> Happy to drive this if there's interest.
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>> 
>>

Re: [DISCUSS] Forking Cassandra utilities into a separately released library

Reply via email to