Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Štefan Miklošovič Thu, 04 Jun 2026 02:50:54 -0700

That all makes sense, Yifan.

The only issue, it is not actually an issue rather than a consequence
of doing it like that. Imagine that there is a change in Analytics but
none in Sidecar and we release a new version. That means that
Analytics would contain a new patch but Sidecar would be a "dummy"
release. We would bump the version of Sidecar just for the sake of it.
Then people trying to investigate what has changed between these
versions would realize that, awkwardly, nothing changed.


I can live with it. It is just something to be aware of.

On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote:
>
> Hi all,
>
> Thanks for the great discussion so far. A few thoughts on the open questions:
>
> Naming
>
> I'd like to suggest cassandra-companion as the name for the merged 
> repository. Both existing names create confusion in opposite directions: 
> operational features like rolling restart and health monitoring feel out of 
> place in cassandra-analytics (Joey's point), while a bulk read/write 
> connector library feels out of place in cassandra-sidecar. A new neutral name 
> avoids subordinating either project's identity to the other, and is broad 
> enough to accommodate future additions beyond Analytics and Sidecar, without 
> implying Cassandra core is included, as names like cassandra-ecosystem or 
> cassandra-platform might.
>
> For the JIRA project key, CASSCOMP would be a natural fit.
>
> API Compatibility
>
> Jeremiah raises a valid concern — co-locating the client and server removes 
> the repo boundary that previously reminded developers they are touching a 
> public API surface. Štefan's versioning model addresses the consumer-facing 
> question ("what runs with what") well, but we also need developer-facing 
> guardrails to mechanically enforce the promise. I'd propose combining three 
> layers:
>
> Versioning contract (Štefan's model): same major.minor guarantees a 
> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe to 
> advance independently; new REST endpoints require a minor bump
> Unified version and release cadence: all modules release together under the 
> same version number. This directly aligns with the merge's core motivation of 
> reducing coordination overhead. The alternative, independent module 
> versioning within the monorepo, would essentially recreate the cross-repo 
> coordination friction we are trying to eliminate. Conveniently, Analytics and 
> Sidecar are currently at the same version number, so there is no awkward jump 
> or reset needed at the point of merge.
> CI enforcement: an OpenAPI contract test that fails if a change breaks the 
> API surface relative to the previous release, plus a compatibility matrix 
> test that runs the N-1 Analytics client against the current Sidecar server
> Stability annotations: adopt @PublicApi / @InternalApi / @Stable / @Evolving 
> / @Deprecated annotations on the Sidecar API surface, following the pattern 
> established by Kafka and Elasticsearch. This makes the contract explicit and 
> discoverable in code — a developer touching an annotated method immediately 
> sees its stability guarantee and since which version it has been public
>
> The three layers are complementary: the versioning model defines the promise, 
> annotations mark the contract in code, and CI enforces the promise 
> mechanically. The unified release cadence ensures the promise is always 
> evaluated as a whole.
>
> As a side note — Cassandra core currently lacks this kind of API stability 
> clarity, which creates real friction for downstream projects. Establishing 
> this practice in the companion project gives us a concrete, working reference 
> that could motivate and inform a broader Cassandra core evolution down the 
> road. Happy to discuss that separately if there is interest.
>
> Looking forward to hearing everyone's thoughts.
>
> Thanks
> - Yifan
>
> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič <[email protected]> 
> wrote:
>>
>> Hi Jeremiah,
>>
>> for now, what I find difficult and I found myself questioning this
>> repeatedly is "what version of Sidecar can I run with Analytics?" Is
>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know
>> until we run it and try. There is no compatibility matrix for what
>> goes with what. If each component is developed independently then I
>> think it will be more messy than if it was released in lock-step.
>>
>> We might establish a policy that e.g. a patch release of Sidecar is
>> compatible with whatever minor in Analytics. For example, we release
>> both Sidecar and Analytics under unified version 1.0.0. Then we will
>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is
>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with
>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user
>> can run without Analytics but once they are interested in Analytics
>> combo, they would need to run with respective Analytics releases.
>>
>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5
>> then you would need to upgrade to 1.1.0 to be sure that it is
>> compatible with Analytics 100% while you could just bump patch
>> releases for Sidecar endlessly if you are interested in Sidecar
>> without Analytics.
>>
>> This would of course mean that there would need to be awareness in
>> "will this patch I want to ship to Sidecar work in related Analytics
>> minor version when we release it?". We might also say that a new REST
>> endpoint can go only into a new minor version and similar.
>>
>> This was, of course, just an example and it is all tweakable.
>>
>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan <[email protected]> wrote:
>> >>
>> >>  I worry if we move into the Sidecar repo it's just going to become more 
>> >> coupled and folks in the community are already using Analytics to read 
>> >> from e.g. S3 buckets or other data sources.
>> >
>> >
>> > I have similar concerns.  If we start releasing them in lockstep from the 
>> > same repo, then I worry that people will start making breaking changes to 
>> > sidecar APIs such that existing Analytics jars out in the wild will not 
>> > work, without realizing it.
>> >
>> > Both cassandra-analytics and the cassandra-sidecar are starting to be used 
>> > out in the world by people in production settings.  My expectation for 
>> > updates to the sidecar APIs is that anything done should not break 
>> > existing clients, when the client and the server are in different repos, 
>> > it is much cleaner and clearer to people that you are exposing an API 
>> > surface which is being consumed externally, and you need to keep things 
>> > like backwards compatibility in mind.  If the client and the server live 
>> > in the same repo, and are released together, I can see people just 
>> > changing/refactoring both and not considering existing clients out in the 
>> > wild.  I think them being in separate repos makes that distinction clearer 
>> > to someone working on a new feature that spans both code bases.
>> >
>> > Seems like many here want them in the same repo, so I won’t block that, 
>> > but I have concerns.
>> >
>> > If we do decide to merge them, I think it should be in a new repo with a 
>> > new name.  I do not think the sidecar belongs in a repo names analytics, 
>> > or the analytics library belongs in a repo named sidecar.  They both have 
>> > use cases that do not involved the other.
>> >
>> > -Jeremiah Jordan
>> >
>> >
>> > On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]> wrote:
>> >>
>> >> Can we break down a bit more where the circular dependency lies, I'm not 
>> >> against it, I just want to make sure we're solving the right problem 
>> >> here. Analytics and CDC were always designed to be agnostic of the 
>> >> Sidecar. What stops us moving just the Sidecar specific parts into the 
>> >> Sidecar repo? I worry if we move into the Sidecar repo it's just going to 
>> >> become more coupled and folks in the community are already using 
>> >> Analytics to read from e.g. S3 buckets or other data sources.
>> >>
>> >> James.
>> >>
>> >> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]> wrote:
>> >>>
>> >>> I'd like to propose we merge the cassandra-sidecar and 
>> >>> cassandra-analytics repositories. I've shopped the idea around to some 
>> >>> of you and gotten universally positive feedback with some questions 
>> >>> about details we deferred to this discussion.
>> >>>
>> >>> Reasons we should merge:
>> >>>
>> >>> Break circular dependencies between the 2 projects
>> >>> Remove redundant copy/pasted code
>> >>> Simplify build and CI
>> >>> Reduce friction on changes that span both projects
>> >>> Simplify the CDC implementation
>> >>>
>> >>>
>> >>> Outstanding questions and observations that came up:
>> >>>
>> >>> Do we merge one repository into the other? Or do we create a new project 
>> >>> and bring them both in?
>> >>> What do we do about JIRA? Leave separate or combine?
>> >>> What do we do with open issues and PR's in github?
>> >>> We'll need to thoughtfully update CI (github + circle) since we're right 
>> >>> at the limit on the free tier on both projects
>> >>> What do we do about existing deprecated repositories 
>> >>> (cassandra-analytics and/or cassandra-sidecar)?
>> >>> We'll need to update our release process
>> >>>
>> >>>
>> >>> Other observations or questions welcome, as are thoughts on the entire 
>> >>> process, on the outstanding questions, etc.
>> >>>
>> >>> Looking forward to the discussion everyone.
>> >>>
>> >>> ~Josh

Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Reply via email to