Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Ekaterina Dimitrova Thu, 04 Jun 2026 07:33:16 -0700

The proposal for CEP comes from the outcome I see coming from this valuable
discussion - people overall agree a merge is valuable as long as the
concerns outlined are hashed


On Thu, 4 Jun 2026 at 10:28, Ekaterina Dimitrova <[email protected]>
wrote:

> Is this CEP- worth it?
>
> To outline all concerns and expectations?
> - backwards compatibility
> - releases
> - API
> - repos
> - Jira
> - CI
> Etc
>
> It can help us also to make some promises and work towards them; document
> them more explicitly and make it easier for anyone new starting to find out
> what the expectations are.  Does it make sense?
>
> I mean it doesn’t have to be 10 pages CEP
>
>
> On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote:
>
>> I prefer cassandra-ecosystem over cassandra-companion. Keeps our options
>> more open going forward (i.e. is a driver a companion? ... no?)
>>
>> To your point Jeremiah, while you'd think having the 2 projects in
>> separate repos would force us to have cleaner APIs defined between them and
>> versioning, in practice that's not the case today. The discipline / energy
>> required to define a clear API boundary and rev it is probably comparable
>> between the 2 paradigms (i.e. status quo dual repo: less discipline
>> required, more energy, monorepo: more discipline required, less energy). At
>> the end of the day I'd posit this is something we've been very poor at as a
>> community across our entire ecosystem. This will be a new muscle for us to
>> build regardless of how the repos are setup.
>>
>> Ideally the 2 projects would be independent of one another and have a
>> shared artifact they both depend upon and that API is how we specify
>> compatibility. That should be relatively straightforward to do in a
>> monorepo w/some refactoring, and if we can get to a shared library we
>> publish from a cassandra-ecosystem repo, we can version that and then it's
>> as simple as "if projects you're working with support the same shared
>> library version, they are compatible".
>>
>> As I write that out, it strikes me that the shared information between
>> them could in theory one day be promoted to a higher architectural tier of
>> shared library where we factor out shared code from analytics and the
>> sidecar, and we factor out shared code from core Cassandra that the
>> ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then
>> all 3 projects (+ drivers?) could take a dependency on that shared library,
>> we rev the version of that, and compatibility is defined by that shared
>> substrate.
>>
>> All very "long term down the road" considerations, but the shape of "get
>> things closer together so they're easier to mutate and work with, then
>> massage the structure and dependencies to make the boundaries and
>> versioning clear through implicit structure" appeals to me.
>>
>> On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote:
>>
>> - I like the name cassandra-ecosystem
>> - We cannot draw dependency direction between Analytics and Sidecar. With
>> Analytics on S3 feature, Analytics can work without Sidecar. Sidecar has
>> many features nothing to do with Analytics. So both can be independent of
>> each other.
>> - The name cassandra-ecosystem allows us to integrate more such
>> features/components into the repo
>>
>>
>>
>> > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič <[email protected]>
>> wrote:
>> >
>> > That all makes sense, Yifan.
>> >
>> > The only issue, it is not actually an issue rather than a consequence
>> > of doing it like that. Imagine that there is a change in Analytics but
>> > none in Sidecar and we release a new version. That means that
>> > Analytics would contain a new patch but Sidecar would be a "dummy"
>> > release. We would bump the version of Sidecar just for the sake of it.
>> > Then people trying to investigate what has changed between these
>> > versions would realize that, awkwardly, nothing changed.
>> >
>> > I can live with it. It is just something to be aware of.
>> >
>> > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> Thanks for the great discussion so far. A few thoughts on the open
>> questions:
>> >>
>> >> Naming
>> >>
>> >> I'd like to suggest cassandra-companion as the name for the merged
>> repository. Both existing names create confusion in opposite directions:
>> operational features like rolling restart and health monitoring feel out of
>> place in cassandra-analytics (Joey's point), while a bulk read/write
>> connector library feels out of place in cassandra-sidecar. A new neutral
>> name avoids subordinating either project's identity to the other, and is
>> broad enough to accommodate future additions beyond Analytics and Sidecar,
>> without implying Cassandra core is included, as names like
>> cassandra-ecosystem or cassandra-platform might.
>> >>
>> >> For the JIRA project key, CASSCOMP would be a natural fit.
>> >>
>> >> API Compatibility
>> >>
>> >> Jeremiah raises a valid concern — co-locating the client and server
>> removes the repo boundary that previously reminded developers they are
>> touching a public API surface. Štefan's versioning model addresses the
>> consumer-facing question ("what runs with what") well, but we also need
>> developer-facing guardrails to mechanically enforce the promise. I'd
>> propose combining three layers:
>> >>
>> >> Versioning contract (Štefan's model): same major.minor guarantees a
>> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe to
>> advance independently; new REST endpoints require a minor bump
>> >> Unified version and release cadence: all modules release together
>> under the same version number. This directly aligns with the merge's core
>> motivation of reducing coordination overhead. The alternative, independent
>> module versioning within the monorepo, would essentially recreate the
>> cross-repo coordination friction we are trying to eliminate. Conveniently,
>> Analytics and Sidecar are currently at the same version number, so there is
>> no awkward jump or reset needed at the point of merge.
>> >> CI enforcement: an OpenAPI contract test that fails if a change breaks
>> the API surface relative to the previous release, plus a compatibility
>> matrix test that runs the N-1 Analytics client against the current Sidecar
>> server
>> >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable /
>> @Evolving / @Deprecated annotations on the Sidecar API surface, following
>> the pattern established by Kafka and Elasticsearch. This makes the contract
>> explicit and discoverable in code — a developer touching an annotated
>> method immediately sees its stability guarantee and since which version it
>> has been public
>> >>
>> >> The three layers are complementary: the versioning model defines the
>> promise, annotations mark the contract in code, and CI enforces the promise
>> mechanically. The unified release cadence ensures the promise is always
>> evaluated as a whole.
>> >>
>> >> As a side note — Cassandra core currently lacks this kind of API
>> stability clarity, which creates real friction for downstream projects.
>> Establishing this practice in the companion project gives us a concrete,
>> working reference that could motivate and inform a broader Cassandra core
>> evolution down the road. Happy to discuss that separately if there is
>> interest.
>> >>
>> >> Looking forward to hearing everyone's thoughts.
>> >>
>> >> Thanks
>> >> - Yifan
>> >>
>> >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič <
>> [email protected]> wrote:
>> >>>
>> >>> Hi Jeremiah,
>> >>>
>> >>> for now, what I find difficult and I found myself questioning this
>> >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is
>> >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know
>> >>> until we run it and try. There is no compatibility matrix for what
>> >>> goes with what. If each component is developed independently then I
>> >>> think it will be more messy than if it was released in lock-step.
>> >>>
>> >>> We might establish a policy that e.g. a patch release of Sidecar is
>> >>> compatible with whatever minor in Analytics. For example, we release
>> >>> both Sidecar and Analytics under unified version 1.0.0. Then we will
>> >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is
>> >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with
>> >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user
>> >>> can run without Analytics but once they are interested in Analytics
>> >>> combo, they would need to run with respective Analytics releases.
>> >>>
>> >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5
>> >>> then you would need to upgrade to 1.1.0 to be sure that it is
>> >>> compatible with Analytics 100% while you could just bump patch
>> >>> releases for Sidecar endlessly if you are interested in Sidecar
>> >>> without Analytics.
>> >>>
>> >>> This would of course mean that there would need to be awareness in
>> >>> "will this patch I want to ship to Sidecar work in related Analytics
>> >>> minor version when we release it?". We might also say that a new REST
>> >>> endpoint can go only into a new minor version and similar.
>> >>>
>> >>> This was, of course, just an example and it is all tweakable.
>> >>>
>> >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> I worry if we move into the Sidecar repo it's just going to become
>> more coupled and folks in the community are already using Analytics to read
>> from e.g. S3 buckets or other data sources.
>> >>>>
>> >>>>
>> >>>> I have similar concerns.  If we start releasing them in lockstep
>> from the same repo, then I worry that people will start making breaking
>> changes to sidecar APIs such that existing Analytics jars out in the wild
>> will not work, without realizing it.
>> >>>>
>> >>>> Both cassandra-analytics and the cassandra-sidecar are starting to
>> be used out in the world by people in production settings.  My expectation
>> for updates to the sidecar APIs is that anything done should not break
>> existing clients, when the client and the server are in different repos, it
>> is much cleaner and clearer to people that you are exposing an API surface
>> which is being consumed externally, and you need to keep things like
>> backwards compatibility in mind.  If the client and the server live in the
>> same repo, and are released together, I can see people just
>> changing/refactoring both and not considering existing clients out in the
>> wild.  I think them being in separate repos makes that distinction clearer
>> to someone working on a new feature that spans both code bases.
>> >>>>
>> >>>> Seems like many here want them in the same repo, so I won’t block
>> that, but I have concerns.
>> >>>>
>> >>>> If we do decide to merge them, I think it should be in a new repo
>> with a new name.  I do not think the sidecar belongs in a repo names
>> analytics, or the analytics library belongs in a repo named sidecar.  They
>> both have use cases that do not involved the other.
>> >>>>
>> >>>> -Jeremiah Jordan
>> >>>>
>> >>>>
>> >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> Can we break down a bit more where the circular dependency lies,
>> I'm not against it, I just want to make sure we're solving the right
>> problem here. Analytics and CDC were always designed to be agnostic of the
>> Sidecar. What stops us moving just the Sidecar specific parts into the
>> Sidecar repo? I worry if we move into the Sidecar repo it's just going to
>> become more coupled and folks in the community are already using Analytics
>> to read from e.g. S3 buckets or other data sources.
>> >>>>>
>> >>>>> James.
>> >>>>>
>> >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>> I'd like to propose we merge the cassandra-sidecar and
>> cassandra-analytics repositories. I've shopped the idea around to some of
>> you and gotten universally positive feedback with some questions about
>> details we deferred to this discussion.
>> >>>>>>
>> >>>>>> Reasons we should merge:
>> >>>>>>
>> >>>>>> Break circular dependencies between the 2 projects
>> >>>>>> Remove redundant copy/pasted code
>> >>>>>> Simplify build and CI
>> >>>>>> Reduce friction on changes that span both projects
>> >>>>>> Simplify the CDC implementation
>> >>>>>>
>> >>>>>>
>> >>>>>> Outstanding questions and observations that came up:
>> >>>>>>
>> >>>>>> Do we merge one repository into the other? Or do we create a new
>> project and bring them both in?
>> >>>>>> What do we do about JIRA? Leave separate or combine?
>> >>>>>> What do we do with open issues and PR's in github?
>> >>>>>> We'll need to thoughtfully update CI (github + circle) since we're
>> right at the limit on the free tier on both projects
>> >>>>>> What do we do about existing deprecated repositories
>> (cassandra-analytics and/or cassandra-sidecar)?
>> >>>>>> We'll need to update our release process
>> >>>>>>
>> >>>>>>
>> >>>>>> Other observations or questions welcome, as are thoughts on the
>> entire process, on the outstanding questions, etc.
>> >>>>>>
>> >>>>>> Looking forward to the discussion everyone.
>> >>>>>>
>> >>>>>> ~Josh
>>
>>
>>
>>

Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Reply via email to