> What do you mean by this? A "branch of tools per GA branch” I don’t follow. So if we have the following on C* as GA branches: - cassandra-4.1 - cassandra-5.0 - cassandra-6.0
We'd have branches on the tools project for: - cassandra-4.1 - cassandra-5.0 - cassandra-6.0 i.e. we mirror the C* upstream branching strategy and maintain compatibility between HEAD on both repos. That way we can make changes that are C*-version specific if needed w/out having to modernize the integration of tooling w/older C* branches. If tooling for older branches is unlikely to change, then it seems like the following might be optimal: 1. new repo 2. branch strategy matching our primary consumer (C*) 1. Backport changes selectively to older branches as needed 3. embed the tooling as a submodule in C* That a distillation of what you're thinking David? Seems reasonable to me. On Mon, Jun 8, 2026, at 6:24 PM, David Capwell wrote: >> but that introduces the inverse problem where you'd have to make a change >> across N branches on the shared library if you have a patch that introduces >> testing that hits all our GA C* and need to backport that functionality >> instead of changing it in one place. > > In the case I was talking about its the Property, Gen, and Gens classes, and > not cluster level tests (similar to python dtest); so don’t think that would > happen? > > >> • Do we expect the shared functionality in this lib would change frequently >> in ways that would impact multiple branches, or do we think it would be >> mostly stable for older branches and mutate more frequently on trunk? > > I went through our mailing list to see where this has been brought up and a > common set brought up are "executors/futures/collections/concurrency > utilities”. These cases I feel should be the same, that new features are for > trunk and we don’t really need to back port to older branches unless there > are bug fixes (in which case we bump the version). So I work with the > assumption that back port to older branches isn’t that likely. Bug fixes > might need a version bump but should be backwards compatible, new features > should also not break the public API. > > One advantage of being a separate and versioned dependency is its easier to > track when the API is broken, in tree makes this more painful. > > Now, going through the history of this topic there is a group of things that > I don’t think make sense to fork, and its stuff like AbstractType / Index / > IAuthentorictor, etc… plugin authors want a way to handle building their > plugins without Cassandra-all and these APIs are structurally cassandra > related. The stuff I propose extracting out of the code base are generic and > unaware of cassandra as a project. > >> • If the latter (mostly stable, trunk only changes) then having a branch of >> tools per GA branch would be optimal > What do you mean by this? A "branch of tools per GA branch” I don’t follow. > >> From a workflow perspective, a shared library factored out to its own repo >> and embedded into C* branches as a submodule has some attractive properties >> either way. It gives you "best of both worlds" (or least-worst-option) by >> allowing you to work on things seamlessly as though they were one project >> but keep the branching strategies of the tooling and the dependents >> decoupled. Even if we only had 1 branch of the test tooling that all C* >> versions depended on, having it separate and embedded as a submodule should >> give us the same devx ergonomics while preserving the option to customize >> per C* branch fairly easily. > > Yep! While working on accord I never needed 2 different IDEs open, one for > accord and one for cassandra; I was able to make changes as if it was a > single project and the only complexity for development was making sure CI > knew about my accord branch (we have a script in tree for that) and merge is > 3 steps rather than 1 (merge accord, update cassandra to point to latest > accord, merge cassandra). > > Sub modules do have down sides we are currently living with (as you have seen > working with CI) and I do hope its been mostly seamless for people… > > I can also see us trying out a hybrid model… trunk is submodule but once we > fork a major branch we switch to release jars instead; we get the trunk level > velocity and loose all the pain points of submodules when working in a > release branch. > >> On Jun 8, 2026, at 7:25 AM, Josh McKenzie <[email protected]> wrote: >> >>> One other motivation for forking is that we can fix issues one time rather >>> than have to fix in 5 branches that have slightly different versions of our >>> libraries. >> The pain on this one is real. Spit-balling, but I wonder if there'd be a way >> to sustainably have all GA branches depend on this code from trunk and we >> use testing and validation to ensure the code on trunk stays compatible with >> older releases. >> >> There's a lot of complexity there since we'd need CI updated to run that >> subset of tooling tests across all GA branches before a commit (i.e. trunk >> only changes would then potentially impact all GA branches), but maybe that >> actually wouldn't be so bad if we just had a new pipeline that pulled and >> built all GA branches from HEAD and ran through the tooling test suites >> against those releases. That, and it'd only really be in scope if you were >> making changes to that tooling. That said, it would seem pretty weird for >> 5.0 to need to check out code from the trunk branch to build and run tests >> against though... =/ >> >>> My primary need is for test utilities so my focus is there. >> Hm. Yeah, the more I think through this, having a versioned set of test >> utilities in trunk for instance would definitely feel like "crossing the >> streams" (i.e. PropertyTestingBase4.0, PropertyTestingBase4.1, etc). Big >> separation of concerns / scope failure if people working on a trunk branch >> in C* are having to think about other branches and API breakage with them >> (moreso than we already have to w/mixed version upgrades etc.) >> >> Having things like that in a separate repo where we could cut iterate on >> things to update for a single branch would alleviate that immediate >> versioning / mismatch context leak, but that introduces the inverse problem >> where you'd have to make a change across N branches on the shared library if >> you have a patch that introduces testing that hits all our GA C* and need to >> backport that functionality instead of changing it in one place. >> >> Blech. >> >> So as I was drafting the above, my thinking has distilled down to the >> following as being important to have a shared mental model on: >> • Do we expect the shared functionality in this lib would change frequently >> in ways that would impact multiple branches, or do we think it would be >> mostly stable for older branches and mutate more frequently on trunk? >> • If the former (multi-branch impacting blast radius, we keep older GA >> branches in sync / compatible with test harness changes), a single golden >> copy of the shared code that each branch shares would minimize toil >> • If the latter (mostly stable, trunk only changes) then having a branch >> of tools per GA branch would be optimal >> >> From a workflow perspective, a shared library factored out to its own repo >> and embedded into C* branches as a submodule has some attractive properties >> either way. It gives you "best of both worlds" (or least-worst-option) by >> allowing you to work on things seamlessly as though they were one project >> but keep the branching strategies of the tooling and the dependents >> decoupled. Even if we only had 1 branch of the test tooling that all C* >> versions depended on, having it separate and embedded as a submodule should >> give us the same devx ergonomics while preserving the option to customize >> per C* branch fairly easily. >> >> On Fri, Jun 5, 2026, at 9:25 AM, David Capwell wrote: >>> One other motivation for forking is that we can fix issues one time rather >>> than have to fix in 5 branches that have slightly different versions of our >>> libraries. A recent example is CASSANDRA-21216 which was a bug fix for >>> btree. >>> >>> One of the other reasons brought up in the past is that many libraries are >>> needed by accord but accord can’t depend on Cassandra else we have a >>> cyclical dependency, so forking off let’s accord use our libraries. For >>> the time being accord had to fork many libraries in accord to make >>> progress; this is a common issue right now. >>> >>> >>> >>> Sent from my iPhone >>> >>>> On Jun 3, 2026, at 1:45 PM, Josh McKenzie <[email protected]> wrote: >>>> >>>>> delays this effort for years as we need time to get people on board and >>>>> used to gradle before we flip that switch. >>>> Oof. I'm way more optimistic on this one; if we can get a PR that has ant >>>> targets as dumb wrappers that instead call gradle targets (i.e. all >>>> workflows and local scripting Just Work), I don't see why we couldn't >>>> merge that as soon as we ironed out kinks. >>>> >>>> Is there anyone that's broadly against that approach? Or did I just >>>> misunderstand the other thread / JIRA you'd created David? >>>> >>>> On Wed, Jun 3, 2026, at 1:21 PM, David Capwell wrote: >>>>> Fair point but one thing to point out, if this work depends on gradle >>>>> that delays this effort for years as we need time to get people on board >>>>> and used to gradle before we flip that switch. So leaving in tree means >>>>> we have to hand roll all that logic in ant. >>>>> >>>>> Sent from my iPhone >>>>> >>>>>> On Jun 3, 2026, at 12:33 PM, Jon Haddad <[email protected]> wrote: >>>>>> >>>>>> Josh is right. Gradle subprojects could allow this without dealing with >>>>>> separate repo. I've done this before and am about to again for some >>>>>> stuff I maintain. I spent a long time agonozing over this for my other >>>>>> projects and found it works exceptionally well, especially bc you >>>>>> frequently develop things that are tightly coupled. >>>>>> >>>>>> Juggling repos sucks, this solves it (imo) perfectly. >>>>>> >>>>>> Jon >>>>>> >>>>>> On Tue, Jun 2, 2026 at 1:18 PM Josh McKenzie <[email protected]> >>>>>> wrote: >>>>>>> __ >>>>>>>> Is there a reason not to use a folder in the current repo that becomes >>>>>>>> its own jar? It can even be published separately if we like? >>>>>>> >>>>>>>> Mostly to decouple from Cassandra release. >>>>>>> I *think* we could just have that .jar release on its own cadence >>>>>>> independently of the parent C* project. >>>>>>> >>>>>>> Some of us have talked about taking this same approach to making some >>>>>>> code from C* available to the ecosystem (think I/O .jar that has >>>>>>> SSTable read/write, CommitLog read/write, etc). This feels like a very >>>>>>> similarly shaped thing. >>>>>>> >>>>>>> I assume w/a modern build / publish / etc system we'd be able to >>>>>>> publish a release that represents a strict subset of the parent project >>>>>>> out of the repo right? >>>>>>> >>>>>>> On Mon, Jun 1, 2026, at 8:18 PM, David Capwell wrote: >>>>>>>> Mostly to decouple from Cassandra release. If there is a feature >>>>>>>> added does it have to wait for the next major release of Cassandra so >>>>>>>> others can consume? Even if we can get to yearly releases that’s >>>>>>>> still a long wait. >>>>>>>> >>>>>>>> For example Alex and I have been talking about proper fuzz testing, so >>>>>>>> best case is a year before 3rd parties could use. >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Jun 1, 2026, at 4:32 PM, Jeremiah Jordan <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Does it need to be a separate repo? Is there a reason not to use a >>>>>>>>> folder in the current repo that becomes its own jar? It can even be >>>>>>>>> published separately if we like? >>>>>>>>> >>>>>>>>> -Jeremiah >>>>>>>>> >>>>>>>>> On Jun 1, 2026 at 10:00:15 AM, David Capwell <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> We've discussed pulling utilities out of trunk before. I'd like to >>>>>>>>>> actually start. My primary need is for test utilities so my focus >>>>>>>>>> is there. >>>>>>>>>> >>>>>>>>>> This isn't just my need. Sidecar wants property/stateful tests but >>>>>>>>>> can't use ours without a published jar. >>>>>>>>>> >>>>>>>>>> Proposed approach: >>>>>>>>>> >>>>>>>>>> 1. Define scope — start with property/stateful test utilities >>>>>>>>>> 2. Set up the repo and release independently of Cassandra >>>>>>>>>> 3. ... >>>>>>>>>> 4. Cassandra depends on the library >>>>>>>>>> >>>>>>>>>> I'd focus on the fork first, before making Cassandra depend on it — >>>>>>>>>> keeps our builds simple and gives the lib room to stabilize. We can >>>>>>>>>> sort out the dependency question later (wait on releases, or use >>>>>>>>>> submodules?). >>>>>>>>>> >>>>>>>>>> Happy to drive this if there's interest. >>>>>>>>>> >>>>>>>>>> Sent from my iPhone >>>>>>> >>>> >>
