Is this CEP- worth it? To outline all concerns and expectations? - backwards compatibility - releases - API - repos - Jira - CI Etc
It can help us also to make some promises and work towards them; document them more explicitly and make it easier for anyone new starting to find out what the expectations are. Does it make sense? I mean it doesn’t have to be 10 pages CEP On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote: > I prefer cassandra-ecosystem over cassandra-companion. Keeps our options > more open going forward (i.e. is a driver a companion? ... no?) > > To your point Jeremiah, while you'd think having the 2 projects in > separate repos would force us to have cleaner APIs defined between them and > versioning, in practice that's not the case today. The discipline / energy > required to define a clear API boundary and rev it is probably comparable > between the 2 paradigms (i.e. status quo dual repo: less discipline > required, more energy, monorepo: more discipline required, less energy). At > the end of the day I'd posit this is something we've been very poor at as a > community across our entire ecosystem. This will be a new muscle for us to > build regardless of how the repos are setup. > > Ideally the 2 projects would be independent of one another and have a > shared artifact they both depend upon and that API is how we specify > compatibility. That should be relatively straightforward to do in a > monorepo w/some refactoring, and if we can get to a shared library we > publish from a cassandra-ecosystem repo, we can version that and then it's > as simple as "if projects you're working with support the same shared > library version, they are compatible". > > As I write that out, it strikes me that the shared information between > them could in theory one day be promoted to a higher architectural tier of > shared library where we factor out shared code from analytics and the > sidecar, and we factor out shared code from core Cassandra that the > ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then > all 3 projects (+ drivers?) could take a dependency on that shared library, > we rev the version of that, and compatibility is defined by that shared > substrate. > > All very "long term down the road" considerations, but the shape of "get > things closer together so they're easier to mutate and work with, then > massage the structure and dependencies to make the boundaries and > versioning clear through implicit structure" appeals to me. > > On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote: > > - I like the name cassandra-ecosystem > - We cannot draw dependency direction between Analytics and Sidecar. With > Analytics on S3 feature, Analytics can work without Sidecar. Sidecar has > many features nothing to do with Analytics. So both can be independent of > each other. > - The name cassandra-ecosystem allows us to integrate more such > features/components into the repo > > > > > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič <[email protected]> > wrote: > > > > That all makes sense, Yifan. > > > > The only issue, it is not actually an issue rather than a consequence > > of doing it like that. Imagine that there is a change in Analytics but > > none in Sidecar and we release a new version. That means that > > Analytics would contain a new patch but Sidecar would be a "dummy" > > release. We would bump the version of Sidecar just for the sake of it. > > Then people trying to investigate what has changed between these > > versions would realize that, awkwardly, nothing changed. > > > > I can live with it. It is just something to be aware of. > > > > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote: > >> > >> Hi all, > >> > >> Thanks for the great discussion so far. A few thoughts on the open > questions: > >> > >> Naming > >> > >> I'd like to suggest cassandra-companion as the name for the merged > repository. Both existing names create confusion in opposite directions: > operational features like rolling restart and health monitoring feel out of > place in cassandra-analytics (Joey's point), while a bulk read/write > connector library feels out of place in cassandra-sidecar. A new neutral > name avoids subordinating either project's identity to the other, and is > broad enough to accommodate future additions beyond Analytics and Sidecar, > without implying Cassandra core is included, as names like > cassandra-ecosystem or cassandra-platform might. > >> > >> For the JIRA project key, CASSCOMP would be a natural fit. > >> > >> API Compatibility > >> > >> Jeremiah raises a valid concern — co-locating the client and server > removes the repo boundary that previously reminded developers they are > touching a public API surface. Štefan's versioning model addresses the > consumer-facing question ("what runs with what") well, but we also need > developer-facing guardrails to mechanically enforce the promise. I'd > propose combining three layers: > >> > >> Versioning contract (Štefan's model): same major.minor guarantees a > compatible Analytics/Sidecar pair; patch releases of Sidecar are safe to > advance independently; new REST endpoints require a minor bump > >> Unified version and release cadence: all modules release together under > the same version number. This directly aligns with the merge's core > motivation of reducing coordination overhead. The alternative, independent > module versioning within the monorepo, would essentially recreate the > cross-repo coordination friction we are trying to eliminate. Conveniently, > Analytics and Sidecar are currently at the same version number, so there is > no awkward jump or reset needed at the point of merge. > >> CI enforcement: an OpenAPI contract test that fails if a change breaks > the API surface relative to the previous release, plus a compatibility > matrix test that runs the N-1 Analytics client against the current Sidecar > server > >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable / > @Evolving / @Deprecated annotations on the Sidecar API surface, following > the pattern established by Kafka and Elasticsearch. This makes the contract > explicit and discoverable in code — a developer touching an annotated > method immediately sees its stability guarantee and since which version it > has been public > >> > >> The three layers are complementary: the versioning model defines the > promise, annotations mark the contract in code, and CI enforces the promise > mechanically. The unified release cadence ensures the promise is always > evaluated as a whole. > >> > >> As a side note — Cassandra core currently lacks this kind of API > stability clarity, which creates real friction for downstream projects. > Establishing this practice in the companion project gives us a concrete, > working reference that could motivate and inform a broader Cassandra core > evolution down the road. Happy to discuss that separately if there is > interest. > >> > >> Looking forward to hearing everyone's thoughts. > >> > >> Thanks > >> - Yifan > >> > >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič < > [email protected]> wrote: > >>> > >>> Hi Jeremiah, > >>> > >>> for now, what I find difficult and I found myself questioning this > >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is > >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know > >>> until we run it and try. There is no compatibility matrix for what > >>> goes with what. If each component is developed independently then I > >>> think it will be more messy than if it was released in lock-step. > >>> > >>> We might establish a policy that e.g. a patch release of Sidecar is > >>> compatible with whatever minor in Analytics. For example, we release > >>> both Sidecar and Analytics under unified version 1.0.0. Then we will > >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is > >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with > >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user > >>> can run without Analytics but once they are interested in Analytics > >>> combo, they would need to run with respective Analytics releases. > >>> > >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5 > >>> then you would need to upgrade to 1.1.0 to be sure that it is > >>> compatible with Analytics 100% while you could just bump patch > >>> releases for Sidecar endlessly if you are interested in Sidecar > >>> without Analytics. > >>> > >>> This would of course mean that there would need to be awareness in > >>> "will this patch I want to ship to Sidecar work in related Analytics > >>> minor version when we release it?". We might also say that a new REST > >>> endpoint can go only into a new minor version and similar. > >>> > >>> This was, of course, just an example and it is all tweakable. > >>> > >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan <[email protected]> > wrote: > >>>>> > >>>>> I worry if we move into the Sidecar repo it's just going to become > more coupled and folks in the community are already using Analytics to read > from e.g. S3 buckets or other data sources. > >>>> > >>>> > >>>> I have similar concerns. If we start releasing them in lockstep from > the same repo, then I worry that people will start making breaking changes > to sidecar APIs such that existing Analytics jars out in the wild will not > work, without realizing it. > >>>> > >>>> Both cassandra-analytics and the cassandra-sidecar are starting to be > used out in the world by people in production settings. My expectation for > updates to the sidecar APIs is that anything done should not break existing > clients, when the client and the server are in different repos, it is much > cleaner and clearer to people that you are exposing an API surface which is > being consumed externally, and you need to keep things like backwards > compatibility in mind. If the client and the server live in the same repo, > and are released together, I can see people just changing/refactoring both > and not considering existing clients out in the wild. I think them being > in separate repos makes that distinction clearer to someone working on a > new feature that spans both code bases. > >>>> > >>>> Seems like many here want them in the same repo, so I won’t block > that, but I have concerns. > >>>> > >>>> If we do decide to merge them, I think it should be in a new repo > with a new name. I do not think the sidecar belongs in a repo names > analytics, or the analytics library belongs in a repo named sidecar. They > both have use cases that do not involved the other. > >>>> > >>>> -Jeremiah Jordan > >>>> > >>>> > >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]> > wrote: > >>>>> > >>>>> Can we break down a bit more where the circular dependency lies, I'm > not against it, I just want to make sure we're solving the right problem > here. Analytics and CDC were always designed to be agnostic of the Sidecar. > What stops us moving just the Sidecar specific parts into the Sidecar repo? > I worry if we move into the Sidecar repo it's just going to become more > coupled and folks in the community are already using Analytics to read from > e.g. S3 buckets or other data sources. > >>>>> > >>>>> James. > >>>>> > >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]> > wrote: > >>>>>> > >>>>>> I'd like to propose we merge the cassandra-sidecar and > cassandra-analytics repositories. I've shopped the idea around to some of > you and gotten universally positive feedback with some questions about > details we deferred to this discussion. > >>>>>> > >>>>>> Reasons we should merge: > >>>>>> > >>>>>> Break circular dependencies between the 2 projects > >>>>>> Remove redundant copy/pasted code > >>>>>> Simplify build and CI > >>>>>> Reduce friction on changes that span both projects > >>>>>> Simplify the CDC implementation > >>>>>> > >>>>>> > >>>>>> Outstanding questions and observations that came up: > >>>>>> > >>>>>> Do we merge one repository into the other? Or do we create a new > project and bring them both in? > >>>>>> What do we do about JIRA? Leave separate or combine? > >>>>>> What do we do with open issues and PR's in github? > >>>>>> We'll need to thoughtfully update CI (github + circle) since we're > right at the limit on the free tier on both projects > >>>>>> What do we do about existing deprecated repositories > (cassandra-analytics and/or cassandra-sidecar)? > >>>>>> We'll need to update our release process > >>>>>> > >>>>>> > >>>>>> Other observations or questions welcome, as are thoughts on the > entire process, on the outstanding questions, etc. > >>>>>> > >>>>>> Looking forward to the discussion everyone. > >>>>>> > >>>>>> ~Josh > > > >
