The proposal for CEP comes from the outcome I see coming from this valuable discussion - people overall agree a merge is valuable as long as the concerns outlined are hashed
On Thu, 4 Jun 2026 at 10:28, Ekaterina Dimitrova <[email protected]> wrote: > Is this CEP- worth it? > > To outline all concerns and expectations? > - backwards compatibility > - releases > - API > - repos > - Jira > - CI > Etc > > It can help us also to make some promises and work towards them; document > them more explicitly and make it easier for anyone new starting to find out > what the expectations are. Does it make sense? > > I mean it doesn’t have to be 10 pages CEP > > > On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote: > >> I prefer cassandra-ecosystem over cassandra-companion. Keeps our options >> more open going forward (i.e. is a driver a companion? ... no?) >> >> To your point Jeremiah, while you'd think having the 2 projects in >> separate repos would force us to have cleaner APIs defined between them and >> versioning, in practice that's not the case today. The discipline / energy >> required to define a clear API boundary and rev it is probably comparable >> between the 2 paradigms (i.e. status quo dual repo: less discipline >> required, more energy, monorepo: more discipline required, less energy). At >> the end of the day I'd posit this is something we've been very poor at as a >> community across our entire ecosystem. This will be a new muscle for us to >> build regardless of how the repos are setup. >> >> Ideally the 2 projects would be independent of one another and have a >> shared artifact they both depend upon and that API is how we specify >> compatibility. That should be relatively straightforward to do in a >> monorepo w/some refactoring, and if we can get to a shared library we >> publish from a cassandra-ecosystem repo, we can version that and then it's >> as simple as "if projects you're working with support the same shared >> library version, they are compatible". >> >> As I write that out, it strikes me that the shared information between >> them could in theory one day be promoted to a higher architectural tier of >> shared library where we factor out shared code from analytics and the >> sidecar, and we factor out shared code from core Cassandra that the >> ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then >> all 3 projects (+ drivers?) could take a dependency on that shared library, >> we rev the version of that, and compatibility is defined by that shared >> substrate. >> >> All very "long term down the road" considerations, but the shape of "get >> things closer together so they're easier to mutate and work with, then >> massage the structure and dependencies to make the boundaries and >> versioning clear through implicit structure" appeals to me. >> >> On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote: >> >> - I like the name cassandra-ecosystem >> - We cannot draw dependency direction between Analytics and Sidecar. With >> Analytics on S3 feature, Analytics can work without Sidecar. Sidecar has >> many features nothing to do with Analytics. So both can be independent of >> each other. >> - The name cassandra-ecosystem allows us to integrate more such >> features/components into the repo >> >> >> >> > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič <[email protected]> >> wrote: >> > >> > That all makes sense, Yifan. >> > >> > The only issue, it is not actually an issue rather than a consequence >> > of doing it like that. Imagine that there is a change in Analytics but >> > none in Sidecar and we release a new version. That means that >> > Analytics would contain a new patch but Sidecar would be a "dummy" >> > release. We would bump the version of Sidecar just for the sake of it. >> > Then people trying to investigate what has changed between these >> > versions would realize that, awkwardly, nothing changed. >> > >> > I can live with it. It is just something to be aware of. >> > >> > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote: >> >> >> >> Hi all, >> >> >> >> Thanks for the great discussion so far. A few thoughts on the open >> questions: >> >> >> >> Naming >> >> >> >> I'd like to suggest cassandra-companion as the name for the merged >> repository. Both existing names create confusion in opposite directions: >> operational features like rolling restart and health monitoring feel out of >> place in cassandra-analytics (Joey's point), while a bulk read/write >> connector library feels out of place in cassandra-sidecar. A new neutral >> name avoids subordinating either project's identity to the other, and is >> broad enough to accommodate future additions beyond Analytics and Sidecar, >> without implying Cassandra core is included, as names like >> cassandra-ecosystem or cassandra-platform might. >> >> >> >> For the JIRA project key, CASSCOMP would be a natural fit. >> >> >> >> API Compatibility >> >> >> >> Jeremiah raises a valid concern — co-locating the client and server >> removes the repo boundary that previously reminded developers they are >> touching a public API surface. Štefan's versioning model addresses the >> consumer-facing question ("what runs with what") well, but we also need >> developer-facing guardrails to mechanically enforce the promise. I'd >> propose combining three layers: >> >> >> >> Versioning contract (Štefan's model): same major.minor guarantees a >> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe to >> advance independently; new REST endpoints require a minor bump >> >> Unified version and release cadence: all modules release together >> under the same version number. This directly aligns with the merge's core >> motivation of reducing coordination overhead. The alternative, independent >> module versioning within the monorepo, would essentially recreate the >> cross-repo coordination friction we are trying to eliminate. Conveniently, >> Analytics and Sidecar are currently at the same version number, so there is >> no awkward jump or reset needed at the point of merge. >> >> CI enforcement: an OpenAPI contract test that fails if a change breaks >> the API surface relative to the previous release, plus a compatibility >> matrix test that runs the N-1 Analytics client against the current Sidecar >> server >> >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable / >> @Evolving / @Deprecated annotations on the Sidecar API surface, following >> the pattern established by Kafka and Elasticsearch. This makes the contract >> explicit and discoverable in code — a developer touching an annotated >> method immediately sees its stability guarantee and since which version it >> has been public >> >> >> >> The three layers are complementary: the versioning model defines the >> promise, annotations mark the contract in code, and CI enforces the promise >> mechanically. The unified release cadence ensures the promise is always >> evaluated as a whole. >> >> >> >> As a side note — Cassandra core currently lacks this kind of API >> stability clarity, which creates real friction for downstream projects. >> Establishing this practice in the companion project gives us a concrete, >> working reference that could motivate and inform a broader Cassandra core >> evolution down the road. Happy to discuss that separately if there is >> interest. >> >> >> >> Looking forward to hearing everyone's thoughts. >> >> >> >> Thanks >> >> - Yifan >> >> >> >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič < >> [email protected]> wrote: >> >>> >> >>> Hi Jeremiah, >> >>> >> >>> for now, what I find difficult and I found myself questioning this >> >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is >> >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know >> >>> until we run it and try. There is no compatibility matrix for what >> >>> goes with what. If each component is developed independently then I >> >>> think it will be more messy than if it was released in lock-step. >> >>> >> >>> We might establish a policy that e.g. a patch release of Sidecar is >> >>> compatible with whatever minor in Analytics. For example, we release >> >>> both Sidecar and Analytics under unified version 1.0.0. Then we will >> >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is >> >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with >> >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user >> >>> can run without Analytics but once they are interested in Analytics >> >>> combo, they would need to run with respective Analytics releases. >> >>> >> >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5 >> >>> then you would need to upgrade to 1.1.0 to be sure that it is >> >>> compatible with Analytics 100% while you could just bump patch >> >>> releases for Sidecar endlessly if you are interested in Sidecar >> >>> without Analytics. >> >>> >> >>> This would of course mean that there would need to be awareness in >> >>> "will this patch I want to ship to Sidecar work in related Analytics >> >>> minor version when we release it?". We might also say that a new REST >> >>> endpoint can go only into a new minor version and similar. >> >>> >> >>> This was, of course, just an example and it is all tweakable. >> >>> >> >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan <[email protected]> >> wrote: >> >>>>> >> >>>>> I worry if we move into the Sidecar repo it's just going to become >> more coupled and folks in the community are already using Analytics to read >> from e.g. S3 buckets or other data sources. >> >>>> >> >>>> >> >>>> I have similar concerns. If we start releasing them in lockstep >> from the same repo, then I worry that people will start making breaking >> changes to sidecar APIs such that existing Analytics jars out in the wild >> will not work, without realizing it. >> >>>> >> >>>> Both cassandra-analytics and the cassandra-sidecar are starting to >> be used out in the world by people in production settings. My expectation >> for updates to the sidecar APIs is that anything done should not break >> existing clients, when the client and the server are in different repos, it >> is much cleaner and clearer to people that you are exposing an API surface >> which is being consumed externally, and you need to keep things like >> backwards compatibility in mind. If the client and the server live in the >> same repo, and are released together, I can see people just >> changing/refactoring both and not considering existing clients out in the >> wild. I think them being in separate repos makes that distinction clearer >> to someone working on a new feature that spans both code bases. >> >>>> >> >>>> Seems like many here want them in the same repo, so I won’t block >> that, but I have concerns. >> >>>> >> >>>> If we do decide to merge them, I think it should be in a new repo >> with a new name. I do not think the sidecar belongs in a repo names >> analytics, or the analytics library belongs in a repo named sidecar. They >> both have use cases that do not involved the other. >> >>>> >> >>>> -Jeremiah Jordan >> >>>> >> >>>> >> >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]> >> wrote: >> >>>>> >> >>>>> Can we break down a bit more where the circular dependency lies, >> I'm not against it, I just want to make sure we're solving the right >> problem here. Analytics and CDC were always designed to be agnostic of the >> Sidecar. What stops us moving just the Sidecar specific parts into the >> Sidecar repo? I worry if we move into the Sidecar repo it's just going to >> become more coupled and folks in the community are already using Analytics >> to read from e.g. S3 buckets or other data sources. >> >>>>> >> >>>>> James. >> >>>>> >> >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]> >> wrote: >> >>>>>> >> >>>>>> I'd like to propose we merge the cassandra-sidecar and >> cassandra-analytics repositories. I've shopped the idea around to some of >> you and gotten universally positive feedback with some questions about >> details we deferred to this discussion. >> >>>>>> >> >>>>>> Reasons we should merge: >> >>>>>> >> >>>>>> Break circular dependencies between the 2 projects >> >>>>>> Remove redundant copy/pasted code >> >>>>>> Simplify build and CI >> >>>>>> Reduce friction on changes that span both projects >> >>>>>> Simplify the CDC implementation >> >>>>>> >> >>>>>> >> >>>>>> Outstanding questions and observations that came up: >> >>>>>> >> >>>>>> Do we merge one repository into the other? Or do we create a new >> project and bring them both in? >> >>>>>> What do we do about JIRA? Leave separate or combine? >> >>>>>> What do we do with open issues and PR's in github? >> >>>>>> We'll need to thoughtfully update CI (github + circle) since we're >> right at the limit on the free tier on both projects >> >>>>>> What do we do about existing deprecated repositories >> (cassandra-analytics and/or cassandra-sidecar)? >> >>>>>> We'll need to update our release process >> >>>>>> >> >>>>>> >> >>>>>> Other observations or questions welcome, as are thoughts on the >> entire process, on the outstanding questions, etc. >> >>>>>> >> >>>>>> Looking forward to the discussion everyone. >> >>>>>> >> >>>>>> ~Josh >> >> >> >>
