Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Patrick McFadin Thu, 04 Jun 2026 09:36:56 -0700

+1 on cassandra-ecosystem. Cassandra-buddy would be fun, but sadly,
ecosystem is more on brand for what this needs to be.


+1 on a CEP just as a matter of record and consensus we can point people to
when they want to participate.

Patrick

On Thu, Jun 4, 2026 at 9:32 AM Yifan Cai <[email protected]> wrote:

> Happy to go with *cassandra-ecosystem*. The community enthusiasm for the
> name is a good signal in itself.
> The one mild concern I had was that "ecosystem" could imply Cassandra core
> is included in scope, but I think that is easily addressed with a clear
> repository description and README introduction. Consider my earlier
> suggestion withdrawn.
>
> A CEP is a great idea, and it doesn't need to be exhaustive. It is a place
> to record the decisions made in this thread, so that they are explicitly
> committed to rather than informally agreed upon in a mailing list thread.
> It also directly addresses Jeremiah's concern: the stability annotations
> and CI enforcement mechanisms we discussed are exactly the kind of promises
> that belong in a CEP, where new contributors can find them and understand
> the expectations from day one.
>
> - Yifan
>
> On Thu, Jun 4, 2026 at 7:33 AM Ekaterina Dimitrova <[email protected]>
> wrote:
>
>> The proposal for CEP comes from the outcome I see coming from this
>> valuable discussion - people overall agree a merge is valuable as long as
>> the concerns outlined are hashed
>>
>> On Thu, 4 Jun 2026 at 10:28, Ekaterina Dimitrova <[email protected]>
>> wrote:
>>
>>> Is this CEP- worth it?
>>>
>>> To outline all concerns and expectations?
>>> - backwards compatibility
>>> - releases
>>> - API
>>> - repos
>>> - Jira
>>> - CI
>>> Etc
>>>
>>> It can help us also to make some promises and work towards them;
>>> document them more explicitly and make it easier for anyone new starting to
>>> find out what the expectations are.  Does it make sense?
>>>
>>> I mean it doesn’t have to be 10 pages CEP
>>>
>>>
>>> On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote:
>>>
>>>> I prefer cassandra-ecosystem over cassandra-companion. Keeps our
>>>> options more open going forward (i.e. is a driver a companion? ... no?)
>>>>
>>>> To your point Jeremiah, while you'd think having the 2 projects in
>>>> separate repos would force us to have cleaner APIs defined between them and
>>>> versioning, in practice that's not the case today. The discipline / energy
>>>> required to define a clear API boundary and rev it is probably comparable
>>>> between the 2 paradigms (i.e. status quo dual repo: less discipline
>>>> required, more energy, monorepo: more discipline required, less energy). At
>>>> the end of the day I'd posit this is something we've been very poor at as a
>>>> community across our entire ecosystem. This will be a new muscle for us to
>>>> build regardless of how the repos are setup.
>>>>
>>>> Ideally the 2 projects would be independent of one another and have a
>>>> shared artifact they both depend upon and that API is how we specify
>>>> compatibility. That should be relatively straightforward to do in a
>>>> monorepo w/some refactoring, and if we can get to a shared library we
>>>> publish from a cassandra-ecosystem repo, we can version that and then it's
>>>> as simple as "if projects you're working with support the same shared
>>>> library version, they are compatible".
>>>>
>>>> As I write that out, it strikes me that the shared information between
>>>> them could in theory one day be promoted to a higher architectural tier of
>>>> shared library where we factor out shared code from analytics and the
>>>> sidecar, and we factor out shared code from core Cassandra that the
>>>> ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then
>>>> all 3 projects (+ drivers?) could take a dependency on that shared library,
>>>> we rev the version of that, and compatibility is defined by that shared
>>>> substrate.
>>>>
>>>> All very "long term down the road" considerations, but the shape of
>>>> "get things closer together so they're easier to mutate and work with, then
>>>> massage the structure and dependencies to make the boundaries and
>>>> versioning clear through implicit structure" appeals to me.
>>>>
>>>> On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote:
>>>>
>>>> - I like the name cassandra-ecosystem
>>>> - We cannot draw dependency direction between Analytics and Sidecar.
>>>> With Analytics on S3 feature, Analytics can work without Sidecar. Sidecar
>>>> has many features nothing to do with Analytics. So both can be independent
>>>> of each other.
>>>> - The name cassandra-ecosystem allows us to integrate more such
>>>> features/components into the repo
>>>>
>>>>
>>>>
>>>> > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič <
>>>> [email protected]> wrote:
>>>> >
>>>> > That all makes sense, Yifan.
>>>> >
>>>> > The only issue, it is not actually an issue rather than a consequence
>>>> > of doing it like that. Imagine that there is a change in Analytics but
>>>> > none in Sidecar and we release a new version. That means that
>>>> > Analytics would contain a new patch but Sidecar would be a "dummy"
>>>> > release. We would bump the version of Sidecar just for the sake of it.
>>>> > Then people trying to investigate what has changed between these
>>>> > versions would realize that, awkwardly, nothing changed.
>>>> >
>>>> > I can live with it. It is just something to be aware of.
>>>> >
>>>> > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote:
>>>> >>
>>>> >> Hi all,
>>>> >>
>>>> >> Thanks for the great discussion so far. A few thoughts on the open
>>>> questions:
>>>> >>
>>>> >> Naming
>>>> >>
>>>> >> I'd like to suggest cassandra-companion as the name for the merged
>>>> repository. Both existing names create confusion in opposite directions:
>>>> operational features like rolling restart and health monitoring feel out of
>>>> place in cassandra-analytics (Joey's point), while a bulk read/write
>>>> connector library feels out of place in cassandra-sidecar. A new neutral
>>>> name avoids subordinating either project's identity to the other, and is
>>>> broad enough to accommodate future additions beyond Analytics and Sidecar,
>>>> without implying Cassandra core is included, as names like
>>>> cassandra-ecosystem or cassandra-platform might.
>>>> >>
>>>> >> For the JIRA project key, CASSCOMP would be a natural fit.
>>>> >>
>>>> >> API Compatibility
>>>> >>
>>>> >> Jeremiah raises a valid concern — co-locating the client and server
>>>> removes the repo boundary that previously reminded developers they are
>>>> touching a public API surface. Štefan's versioning model addresses the
>>>> consumer-facing question ("what runs with what") well, but we also need
>>>> developer-facing guardrails to mechanically enforce the promise. I'd
>>>> propose combining three layers:
>>>> >>
>>>> >> Versioning contract (Štefan's model): same major.minor guarantees a
>>>> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe to
>>>> advance independently; new REST endpoints require a minor bump
>>>> >> Unified version and release cadence: all modules release together
>>>> under the same version number. This directly aligns with the merge's core
>>>> motivation of reducing coordination overhead. The alternative, independent
>>>> module versioning within the monorepo, would essentially recreate the
>>>> cross-repo coordination friction we are trying to eliminate. Conveniently,
>>>> Analytics and Sidecar are currently at the same version number, so there is
>>>> no awkward jump or reset needed at the point of merge.
>>>> >> CI enforcement: an OpenAPI contract test that fails if a change
>>>> breaks the API surface relative to the previous release, plus a
>>>> compatibility matrix test that runs the N-1 Analytics client against the
>>>> current Sidecar server
>>>> >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable /
>>>> @Evolving / @Deprecated annotations on the Sidecar API surface, following
>>>> the pattern established by Kafka and Elasticsearch. This makes the contract
>>>> explicit and discoverable in code — a developer touching an annotated
>>>> method immediately sees its stability guarantee and since which version it
>>>> has been public
>>>> >>
>>>> >> The three layers are complementary: the versioning model defines the
>>>> promise, annotations mark the contract in code, and CI enforces the promise
>>>> mechanically. The unified release cadence ensures the promise is always
>>>> evaluated as a whole.
>>>> >>
>>>> >> As a side note — Cassandra core currently lacks this kind of API
>>>> stability clarity, which creates real friction for downstream projects.
>>>> Establishing this practice in the companion project gives us a concrete,
>>>> working reference that could motivate and inform a broader Cassandra core
>>>> evolution down the road. Happy to discuss that separately if there is
>>>> interest.
>>>> >>
>>>> >> Looking forward to hearing everyone's thoughts.
>>>> >>
>>>> >> Thanks
>>>> >> - Yifan
>>>> >>
>>>> >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič <
>>>> [email protected]> wrote:
>>>> >>>
>>>> >>> Hi Jeremiah,
>>>> >>>
>>>> >>> for now, what I find difficult and I found myself questioning this
>>>> >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is
>>>> >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know
>>>> >>> until we run it and try. There is no compatibility matrix for what
>>>> >>> goes with what. If each component is developed independently then I
>>>> >>> think it will be more messy than if it was released in lock-step.
>>>> >>>
>>>> >>> We might establish a policy that e.g. a patch release of Sidecar is
>>>> >>> compatible with whatever minor in Analytics. For example, we release
>>>> >>> both Sidecar and Analytics under unified version 1.0.0. Then we will
>>>> >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is
>>>> >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with
>>>> >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a
>>>> user
>>>> >>> can run without Analytics but once they are interested in Analytics
>>>> >>> combo, they would need to run with respective Analytics releases.
>>>> >>>
>>>> >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5
>>>> >>> then you would need to upgrade to 1.1.0 to be sure that it is
>>>> >>> compatible with Analytics 100% while you could just bump patch
>>>> >>> releases for Sidecar endlessly if you are interested in Sidecar
>>>> >>> without Analytics.
>>>> >>>
>>>> >>> This would of course mean that there would need to be awareness in
>>>> >>> "will this patch I want to ship to Sidecar work in related Analytics
>>>> >>> minor version when we release it?". We might also say that a new
>>>> REST
>>>> >>> endpoint can go only into a new minor version and similar.
>>>> >>>
>>>> >>> This was, of course, just an example and it is all tweakable.
>>>> >>>
>>>> >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan <
>>>> [email protected]> wrote:
>>>> >>>>>
>>>> >>>>> I worry if we move into the Sidecar repo it's just going to
>>>> become more coupled and folks in the community are already using Analytics
>>>> to read from e.g. S3 buckets or other data sources.
>>>> >>>>
>>>> >>>>
>>>> >>>> I have similar concerns.  If we start releasing them in lockstep
>>>> from the same repo, then I worry that people will start making breaking
>>>> changes to sidecar APIs such that existing Analytics jars out in the wild
>>>> will not work, without realizing it.
>>>> >>>>
>>>> >>>> Both cassandra-analytics and the cassandra-sidecar are starting to
>>>> be used out in the world by people in production settings.  My expectation
>>>> for updates to the sidecar APIs is that anything done should not break
>>>> existing clients, when the client and the server are in different repos, it
>>>> is much cleaner and clearer to people that you are exposing an API surface
>>>> which is being consumed externally, and you need to keep things like
>>>> backwards compatibility in mind.  If the client and the server live in the
>>>> same repo, and are released together, I can see people just
>>>> changing/refactoring both and not considering existing clients out in the
>>>> wild.  I think them being in separate repos makes that distinction clearer
>>>> to someone working on a new feature that spans both code bases.
>>>> >>>>
>>>> >>>> Seems like many here want them in the same repo, so I won’t block
>>>> that, but I have concerns.
>>>> >>>>
>>>> >>>> If we do decide to merge them, I think it should be in a new repo
>>>> with a new name.  I do not think the sidecar belongs in a repo names
>>>> analytics, or the analytics library belongs in a repo named sidecar.  They
>>>> both have use cases that do not involved the other.
>>>> >>>>
>>>> >>>> -Jeremiah Jordan
>>>> >>>>
>>>> >>>>
>>>> >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> Can we break down a bit more where the circular dependency lies,
>>>> I'm not against it, I just want to make sure we're solving the right
>>>> problem here. Analytics and CDC were always designed to be agnostic of the
>>>> Sidecar. What stops us moving just the Sidecar specific parts into the
>>>> Sidecar repo? I worry if we move into the Sidecar repo it's just going to
>>>> become more coupled and folks in the community are already using Analytics
>>>> to read from e.g. S3 buckets or other data sources.
>>>> >>>>>
>>>> >>>>> James.
>>>> >>>>>
>>>> >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> I'd like to propose we merge the cassandra-sidecar and
>>>> cassandra-analytics repositories. I've shopped the idea around to some of
>>>> you and gotten universally positive feedback with some questions about
>>>> details we deferred to this discussion.
>>>> >>>>>>
>>>> >>>>>> Reasons we should merge:
>>>> >>>>>>
>>>> >>>>>> Break circular dependencies between the 2 projects
>>>> >>>>>> Remove redundant copy/pasted code
>>>> >>>>>> Simplify build and CI
>>>> >>>>>> Reduce friction on changes that span both projects
>>>> >>>>>> Simplify the CDC implementation
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Outstanding questions and observations that came up:
>>>> >>>>>>
>>>> >>>>>> Do we merge one repository into the other? Or do we create a new
>>>> project and bring them both in?
>>>> >>>>>> What do we do about JIRA? Leave separate or combine?
>>>> >>>>>> What do we do with open issues and PR's in github?
>>>> >>>>>> We'll need to thoughtfully update CI (github + circle) since
>>>> we're right at the limit on the free tier on both projects
>>>> >>>>>> What do we do about existing deprecated repositories
>>>> (cassandra-analytics and/or cassandra-sidecar)?
>>>> >>>>>> We'll need to update our release process
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Other observations or questions welcome, as are thoughts on the
>>>> entire process, on the outstanding questions, etc.
>>>> >>>>>>
>>>> >>>>>> Looking forward to the discussion everyone.
>>>> >>>>>>
>>>> >>>>>> ~Josh
>>>>
>>>>
>>>>
>>>>

Re: [DISCUSS] Proposal: We should merge the cassandra-analytics and cassandra-sidecar repositories

Reply via email to