Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Josh McKenzie Tue, 09 Jun 2026 12:45:05 -0700

This thread has been quiet for a few days. Anybody else have anything they want 
to bring up before I start drafting up a CEP for this work?


On Thu, Jun 4, 2026, at 12:36 PM, Patrick McFadin wrote:
> +1 on cassandra-ecosystem. Cassandra-buddy would be fun, but sadly, ecosystem 
> is more on brand for what this needs to be. 
> 
> +1 on a CEP just as a matter of record and consensus we can point people to 
> when they want to participate. 
> 
> Patrick
> 
> On Thu, Jun 4, 2026 at 9:32 AM Yifan Cai <[email protected]> wrote:
>> Happy to go with *cassandra-ecosystem*. The community enthusiasm for the 
>> name is a good signal in itself. 
>> The one mild concern I had was that "ecosystem" could imply Cassandra core 
>> is included in scope, but I think that is easily addressed with a clear 
>> repository description and README introduction. Consider my earlier 
>> suggestion withdrawn.
>> 
>> A CEP is a great idea, and it doesn't need to be exhaustive. It is a place 
>> to record the decisions made in this thread, so that they are explicitly 
>> committed to rather than informally agreed upon in a mailing list thread. 
>> It also directly addresses Jeremiah's concern: the stability annotations and 
>> CI enforcement mechanisms we discussed are exactly the kind of promises that 
>> belong in a CEP, where new contributors can find them and understand the 
>> expectations from day one.
>> 
>> - Yifan
>> 
>> On Thu, Jun 4, 2026 at 7:33 AM Ekaterina Dimitrova <[email protected]> 
>> wrote:
>>> The proposal for CEP comes from the outcome I see coming from this valuable 
>>> discussion - people overall agree a merge is valuable as long as the 
>>> concerns outlined are hashed
>>> 
>>> On Thu, 4 Jun 2026 at 10:28, Ekaterina Dimitrova <[email protected]> 
>>> wrote:
>>>> Is this CEP- worth it? 
>>>> 
>>>> To outline all concerns and expectations? 
>>>> - backwards compatibility
>>>> - releases
>>>> - API 
>>>> - repos
>>>> - Jira
>>>> - CI
>>>> Etc
>>>> 
>>>> It can help us also to make some promises and work towards them; document 
>>>> them more explicitly and make it easier for anyone new starting to find 
>>>> out what the expectations are.  Does it make sense?
>>>> 
>>>> I mean it doesn’t have to be 10 pages CEP
>>>> 
>>>> 
>>>> On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote:
>>>>> __
>>>>> I prefer cassandra-ecosystem over cassandra-companion. Keeps our options 
>>>>> more open going forward (i.e. is a driver a companion? ... no?)
>>>>> 
>>>>> To your point Jeremiah, while you'd think having the 2 projects in 
>>>>> separate repos would force us to have cleaner APIs defined between them 
>>>>> and versioning, in practice that's not the case today. The discipline / 
>>>>> energy required to define a clear API boundary and rev it is probably 
>>>>> comparable between the 2 paradigms (i.e. status quo dual repo: less 
>>>>> discipline required, more energy, monorepo: more discipline required, 
>>>>> less energy). At the end of the day I'd posit this is something we've 
>>>>> been very poor at as a community across our entire ecosystem. This will 
>>>>> be a new muscle for us to build regardless of how the repos are setup.
>>>>> 
>>>>> Ideally the 2 projects would be independent of one another and have a 
>>>>> shared artifact they both depend upon and that API is how we specify 
>>>>> compatibility. That should be relatively straightforward to do in a 
>>>>> monorepo w/some refactoring, and if we can get to a shared library we 
>>>>> publish from a cassandra-ecosystem repo, we can version that and then 
>>>>> it's as simple as "if projects you're working with support the same 
>>>>> shared library version, they are compatible".
>>>>> 
>>>>> As I write that out, it strikes me that the shared information between 
>>>>> them could in theory one day be promoted to a higher architectural tier 
>>>>> of shared library where we factor out shared code from analytics and the 
>>>>> sidecar, and we factor out shared code from core Cassandra that the 
>>>>> ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then 
>>>>> all 3 projects (+ drivers?) could take a dependency on that shared 
>>>>> library, we rev the version of that, and compatibility is defined by that 
>>>>> shared substrate.
>>>>> 
>>>>> All very "long term down the road" considerations, but the shape of "get 
>>>>> things closer together so they're easier to mutate and work with, then 
>>>>> massage the structure and dependencies to make the boundaries and 
>>>>> versioning clear through implicit structure" appeals to me.
>>>>> 
>>>>> On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote:
>>>>>> - I like the name cassandra-ecosystem
>>>>>> - We cannot draw dependency direction between Analytics and Sidecar. 
>>>>>> With Analytics on S3 feature, Analytics can work without Sidecar. 
>>>>>> Sidecar has many features nothing to do with Analytics. So both can be 
>>>>>> independent of each other.
>>>>>> - The name cassandra-ecosystem allows us to integrate more such 
>>>>>> features/components into the repo
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič 
>>>>>> > <[email protected]> wrote:
>>>>>> > 
>>>>>> > That all makes sense, Yifan.
>>>>>> > 
>>>>>> > The only issue, it is not actually an issue rather than a consequence
>>>>>> > of doing it like that. Imagine that there is a change in Analytics but
>>>>>> > none in Sidecar and we release a new version. That means that
>>>>>> > Analytics would contain a new patch but Sidecar would be a "dummy"
>>>>>> > release. We would bump the version of Sidecar just for the sake of it.
>>>>>> > Then people trying to investigate what has changed between these
>>>>>> > versions would realize that, awkwardly, nothing changed.
>>>>>> > 
>>>>>> > I can live with it. It is just something to be aware of.
>>>>>> > 
>>>>>> > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote:
>>>>>> >> 
>>>>>> >> Hi all,
>>>>>> >> 
>>>>>> >> Thanks for the great discussion so far. A few thoughts on the open 
>>>>>> >> questions:
>>>>>> >> 
>>>>>> >> Naming
>>>>>> >> 
>>>>>> >> I'd like to suggest cassandra-companion as the name for the merged 
>>>>>> >> repository. Both existing names create confusion in opposite 
>>>>>> >> directions: operational features like rolling restart and health 
>>>>>> >> monitoring feel out of place in cassandra-analytics (Joey's point), 
>>>>>> >> while a bulk read/write connector library feels out of place in 
>>>>>> >> cassandra-sidecar. A new neutral name avoids subordinating either 
>>>>>> >> project's identity to the other, and is broad enough to accommodate 
>>>>>> >> future additions beyond Analytics and Sidecar, without implying 
>>>>>> >> Cassandra core is included, as names like cassandra-ecosystem or 
>>>>>> >> cassandra-platform might.
>>>>>> >> 
>>>>>> >> For the JIRA project key, CASSCOMP would be a natural fit.
>>>>>> >> 
>>>>>> >> API Compatibility
>>>>>> >> 
>>>>>> >> Jeremiah raises a valid concern — co-locating the client and server 
>>>>>> >> removes the repo boundary that previously reminded developers they 
>>>>>> >> are touching a public API surface. Štefan's versioning model 
>>>>>> >> addresses the consumer-facing question ("what runs with what") well, 
>>>>>> >> but we also need developer-facing guardrails to mechanically enforce 
>>>>>> >> the promise. I'd propose combining three layers:
>>>>>> >> 
>>>>>> >> Versioning contract (Štefan's model): same major.minor guarantees a 
>>>>>> >> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe 
>>>>>> >> to advance independently; new REST endpoints require a minor bump
>>>>>> >> Unified version and release cadence: all modules release together 
>>>>>> >> under the same version number. This directly aligns with the merge's 
>>>>>> >> core motivation of reducing coordination overhead. The alternative, 
>>>>>> >> independent module versioning within the monorepo, would essentially 
>>>>>> >> recreate the cross-repo coordination friction we are trying to 
>>>>>> >> eliminate. Conveniently, Analytics and Sidecar are currently at the 
>>>>>> >> same version number, so there is no awkward jump or reset needed at 
>>>>>> >> the point of merge.
>>>>>> >> CI enforcement: an OpenAPI contract test that fails if a change 
>>>>>> >> breaks the API surface relative to the previous release, plus a 
>>>>>> >> compatibility matrix test that runs the N-1 Analytics client against 
>>>>>> >> the current Sidecar server
>>>>>> >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable / 
>>>>>> >> @Evolving / @Deprecated annotations on the Sidecar API surface, 
>>>>>> >> following the pattern established by Kafka and Elasticsearch. This 
>>>>>> >> makes the contract explicit and discoverable in code — a developer 
>>>>>> >> touching an annotated method immediately sees its stability guarantee 
>>>>>> >> and since which version it has been public
>>>>>> >> 
>>>>>> >> The three layers are complementary: the versioning model defines the 
>>>>>> >> promise, annotations mark the contract in code, and CI enforces the 
>>>>>> >> promise mechanically. The unified release cadence ensures the promise 
>>>>>> >> is always evaluated as a whole.
>>>>>> >> 
>>>>>> >> As a side note — Cassandra core currently lacks this kind of API 
>>>>>> >> stability clarity, which creates real friction for downstream 
>>>>>> >> projects. Establishing this practice in the companion project gives 
>>>>>> >> us a concrete, working reference that could motivate and inform a 
>>>>>> >> broader Cassandra core evolution down the road. Happy to discuss that 
>>>>>> >> separately if there is interest.
>>>>>> >> 
>>>>>> >> Looking forward to hearing everyone's thoughts.
>>>>>> >> 
>>>>>> >> Thanks
>>>>>> >> - Yifan
>>>>>> >> 
>>>>>> >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič 
>>>>>> >> <[email protected]> wrote:
>>>>>> >>> 
>>>>>> >>> Hi Jeremiah,
>>>>>> >>> 
>>>>>> >>> for now, what I find difficult and I found myself questioning this
>>>>>> >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is
>>>>>> >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know
>>>>>> >>> until we run it and try. There is no compatibility matrix for what
>>>>>> >>> goes with what. If each component is developed independently then I
>>>>>> >>> think it will be more messy than if it was released in lock-step.
>>>>>> >>> 
>>>>>> >>> We might establish a policy that e.g. a patch release of Sidecar is
>>>>>> >>> compatible with whatever minor in Analytics. For example, we release
>>>>>> >>> both Sidecar and Analytics under unified version 1.0.0. Then we will
>>>>>> >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is
>>>>>> >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with
>>>>>> >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user
>>>>>> >>> can run without Analytics but once they are interested in Analytics
>>>>>> >>> combo, they would need to run with respective Analytics releases.
>>>>>> >>> 
>>>>>> >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5
>>>>>> >>> then you would need to upgrade to 1.1.0 to be sure that it is
>>>>>> >>> compatible with Analytics 100% while you could just bump patch
>>>>>> >>> releases for Sidecar endlessly if you are interested in Sidecar
>>>>>> >>> without Analytics.
>>>>>> >>> 
>>>>>> >>> This would of course mean that there would need to be awareness in
>>>>>> >>> "will this patch I want to ship to Sidecar work in related Analytics
>>>>>> >>> minor version when we release it?". We might also say that a new REST
>>>>>> >>> endpoint can go only into a new minor version and similar.
>>>>>> >>> 
>>>>>> >>> This was, of course, just an example and it is all tweakable.
>>>>>> >>> 
>>>>>> >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan 
>>>>>> >>> <[email protected]> wrote:
>>>>>> >>>>> 
>>>>>> >>>>> I worry if we move into the Sidecar repo it's just going to become 
>>>>>> >>>>> more coupled and folks in the community are already using 
>>>>>> >>>>> Analytics to read from e.g. S3 buckets or other data sources.
>>>>>> >>>> 
>>>>>> >>>> 
>>>>>> >>>> I have similar concerns.  If we start releasing them in lockstep 
>>>>>> >>>> from the same repo, then I worry that people will start making 
>>>>>> >>>> breaking changes to sidecar APIs such that existing Analytics jars 
>>>>>> >>>> out in the wild will not work, without realizing it.
>>>>>> >>>> 
>>>>>> >>>> Both cassandra-analytics and the cassandra-sidecar are starting to 
>>>>>> >>>> be used out in the world by people in production settings.  My 
>>>>>> >>>> expectation for updates to the sidecar APIs is that anything done 
>>>>>> >>>> should not break existing clients, when the client and the server 
>>>>>> >>>> are in different repos, it is much cleaner and clearer to people 
>>>>>> >>>> that you are exposing an API surface which is being consumed 
>>>>>> >>>> externally, and you need to keep things like backwards 
>>>>>> >>>> compatibility in mind.  If the client and the server live in the 
>>>>>> >>>> same repo, and are released together, I can see people just 
>>>>>> >>>> changing/refactoring both and not considering existing clients out 
>>>>>> >>>> in the wild.  I think them being in separate repos makes that 
>>>>>> >>>> distinction clearer to someone working on a new feature that spans 
>>>>>> >>>> both code bases.
>>>>>> >>>> 
>>>>>> >>>> Seems like many here want them in the same repo, so I won’t block 
>>>>>> >>>> that, but I have concerns.
>>>>>> >>>> 
>>>>>> >>>> If we do decide to merge them, I think it should be in a new repo 
>>>>>> >>>> with a new name.  I do not think the sidecar belongs in a repo 
>>>>>> >>>> names analytics, or the analytics library belongs in a repo named 
>>>>>> >>>> sidecar.  They both have use cases that do not involved the other.
>>>>>> >>>> 
>>>>>> >>>> -Jeremiah Jordan
>>>>>> >>>> 
>>>>>> >>>> 
>>>>>> >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]> 
>>>>>> >>>> wrote:
>>>>>> >>>>> 
>>>>>> >>>>> Can we break down a bit more where the circular dependency lies, 
>>>>>> >>>>> I'm not against it, I just want to make sure we're solving the 
>>>>>> >>>>> right problem here. Analytics and CDC were always designed to be 
>>>>>> >>>>> agnostic of the Sidecar. What stops us moving just the Sidecar 
>>>>>> >>>>> specific parts into the Sidecar repo? I worry if we move into the 
>>>>>> >>>>> Sidecar repo it's just going to become more coupled and folks in 
>>>>>> >>>>> the community are already using Analytics to read from e.g. S3 
>>>>>> >>>>> buckets or other data sources.
>>>>>> >>>>> 
>>>>>> >>>>> James.
>>>>>> >>>>> 
>>>>>> >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]> 
>>>>>> >>>>> wrote:
>>>>>> >>>>>> 
>>>>>> >>>>>> I'd like to propose we merge the cassandra-sidecar and 
>>>>>> >>>>>> cassandra-analytics repositories. I've shopped the idea around to 
>>>>>> >>>>>> some of you and gotten universally positive feedback with some 
>>>>>> >>>>>> questions about details we deferred to this discussion.
>>>>>> >>>>>> 
>>>>>> >>>>>> Reasons we should merge:
>>>>>> >>>>>> 
>>>>>> >>>>>> Break circular dependencies between the 2 projects
>>>>>> >>>>>> Remove redundant copy/pasted code
>>>>>> >>>>>> Simplify build and CI
>>>>>> >>>>>> Reduce friction on changes that span both projects
>>>>>> >>>>>> Simplify the CDC implementation
>>>>>> >>>>>> 
>>>>>> >>>>>> 
>>>>>> >>>>>> Outstanding questions and observations that came up:
>>>>>> >>>>>> 
>>>>>> >>>>>> Do we merge one repository into the other? Or do we create a new 
>>>>>> >>>>>> project and bring them both in?
>>>>>> >>>>>> What do we do about JIRA? Leave separate or combine?
>>>>>> >>>>>> What do we do with open issues and PR's in github?
>>>>>> >>>>>> We'll need to thoughtfully update CI (github + circle) since 
>>>>>> >>>>>> we're right at the limit on the free tier on both projects
>>>>>> >>>>>> What do we do about existing deprecated repositories 
>>>>>> >>>>>> (cassandra-analytics and/or cassandra-sidecar)?
>>>>>> >>>>>> We'll need to update our release process
>>>>>> >>>>>> 
>>>>>> >>>>>> 
>>>>>> >>>>>> Other observations or questions welcome, as are thoughts on the 
>>>>>> >>>>>> entire process, on the outstanding questions, etc.
>>>>>> >>>>>> 
>>>>>> >>>>>> Looking forward to the discussion everyone.
>>>>>> >>>>>> 
>>>>>> >>>>>> ~Josh
>>>>>> 
>>>>>> 
>>>>>

Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Reply via email to