This thread has been quiet for a few days. Anybody else have anything they want to bring up before I start drafting up a CEP for this work?
On Thu, Jun 4, 2026, at 12:36 PM, Patrick McFadin wrote: > +1 on cassandra-ecosystem. Cassandra-buddy would be fun, but sadly, ecosystem > is more on brand for what this needs to be. > > +1 on a CEP just as a matter of record and consensus we can point people to > when they want to participate. > > Patrick > > On Thu, Jun 4, 2026 at 9:32 AM Yifan Cai <[email protected]> wrote: >> Happy to go with *cassandra-ecosystem*. The community enthusiasm for the >> name is a good signal in itself. >> The one mild concern I had was that "ecosystem" could imply Cassandra core >> is included in scope, but I think that is easily addressed with a clear >> repository description and README introduction. Consider my earlier >> suggestion withdrawn. >> >> A CEP is a great idea, and it doesn't need to be exhaustive. It is a place >> to record the decisions made in this thread, so that they are explicitly >> committed to rather than informally agreed upon in a mailing list thread. >> It also directly addresses Jeremiah's concern: the stability annotations and >> CI enforcement mechanisms we discussed are exactly the kind of promises that >> belong in a CEP, where new contributors can find them and understand the >> expectations from day one. >> >> - Yifan >> >> On Thu, Jun 4, 2026 at 7:33 AM Ekaterina Dimitrova <[email protected]> >> wrote: >>> The proposal for CEP comes from the outcome I see coming from this valuable >>> discussion - people overall agree a merge is valuable as long as the >>> concerns outlined are hashed >>> >>> On Thu, 4 Jun 2026 at 10:28, Ekaterina Dimitrova <[email protected]> >>> wrote: >>>> Is this CEP- worth it? >>>> >>>> To outline all concerns and expectations? >>>> - backwards compatibility >>>> - releases >>>> - API >>>> - repos >>>> - Jira >>>> - CI >>>> Etc >>>> >>>> It can help us also to make some promises and work towards them; document >>>> them more explicitly and make it easier for anyone new starting to find >>>> out what the expectations are. Does it make sense? >>>> >>>> I mean it doesn’t have to be 10 pages CEP >>>> >>>> >>>> On Thu, 4 Jun 2026 at 9:58, Josh McKenzie <[email protected]> wrote: >>>>> __ >>>>> I prefer cassandra-ecosystem over cassandra-companion. Keeps our options >>>>> more open going forward (i.e. is a driver a companion? ... no?) >>>>> >>>>> To your point Jeremiah, while you'd think having the 2 projects in >>>>> separate repos would force us to have cleaner APIs defined between them >>>>> and versioning, in practice that's not the case today. The discipline / >>>>> energy required to define a clear API boundary and rev it is probably >>>>> comparable between the 2 paradigms (i.e. status quo dual repo: less >>>>> discipline required, more energy, monorepo: more discipline required, >>>>> less energy). At the end of the day I'd posit this is something we've >>>>> been very poor at as a community across our entire ecosystem. This will >>>>> be a new muscle for us to build regardless of how the repos are setup. >>>>> >>>>> Ideally the 2 projects would be independent of one another and have a >>>>> shared artifact they both depend upon and that API is how we specify >>>>> compatibility. That should be relatively straightforward to do in a >>>>> monorepo w/some refactoring, and if we can get to a shared library we >>>>> publish from a cassandra-ecosystem repo, we can version that and then >>>>> it's as simple as "if projects you're working with support the same >>>>> shared library version, they are compatible". >>>>> >>>>> As I write that out, it strikes me that the shared information between >>>>> them could in theory one day be promoted to a higher architectural tier >>>>> of shared library where we factor out shared code from analytics and the >>>>> sidecar, and we factor out shared code from core Cassandra that the >>>>> ecosystem depends on (i.e. "cassandra-shared", or "cassandra-lib"). Then >>>>> all 3 projects (+ drivers?) could take a dependency on that shared >>>>> library, we rev the version of that, and compatibility is defined by that >>>>> shared substrate. >>>>> >>>>> All very "long term down the road" considerations, but the shape of "get >>>>> things closer together so they're easier to mutate and work with, then >>>>> massage the structure and dependencies to make the boundaries and >>>>> versioning clear through implicit structure" appeals to me. >>>>> >>>>> On Thu, Jun 4, 2026, at 6:00 AM, Shailaja Koppu wrote: >>>>>> - I like the name cassandra-ecosystem >>>>>> - We cannot draw dependency direction between Analytics and Sidecar. >>>>>> With Analytics on S3 feature, Analytics can work without Sidecar. >>>>>> Sidecar has many features nothing to do with Analytics. So both can be >>>>>> independent of each other. >>>>>> - The name cassandra-ecosystem allows us to integrate more such >>>>>> features/components into the repo >>>>>> >>>>>> >>>>>> >>>>>> > On Jun 4, 2026, at 10:50 AM, Štefan Miklošovič >>>>>> > <[email protected]> wrote: >>>>>> > >>>>>> > That all makes sense, Yifan. >>>>>> > >>>>>> > The only issue, it is not actually an issue rather than a consequence >>>>>> > of doing it like that. Imagine that there is a change in Analytics but >>>>>> > none in Sidecar and we release a new version. That means that >>>>>> > Analytics would contain a new patch but Sidecar would be a "dummy" >>>>>> > release. We would bump the version of Sidecar just for the sake of it. >>>>>> > Then people trying to investigate what has changed between these >>>>>> > versions would realize that, awkwardly, nothing changed. >>>>>> > >>>>>> > I can live with it. It is just something to be aware of. >>>>>> > >>>>>> > On Thu, Jun 4, 2026 at 9:42 AM Yifan Cai <[email protected]> wrote: >>>>>> >> >>>>>> >> Hi all, >>>>>> >> >>>>>> >> Thanks for the great discussion so far. A few thoughts on the open >>>>>> >> questions: >>>>>> >> >>>>>> >> Naming >>>>>> >> >>>>>> >> I'd like to suggest cassandra-companion as the name for the merged >>>>>> >> repository. Both existing names create confusion in opposite >>>>>> >> directions: operational features like rolling restart and health >>>>>> >> monitoring feel out of place in cassandra-analytics (Joey's point), >>>>>> >> while a bulk read/write connector library feels out of place in >>>>>> >> cassandra-sidecar. A new neutral name avoids subordinating either >>>>>> >> project's identity to the other, and is broad enough to accommodate >>>>>> >> future additions beyond Analytics and Sidecar, without implying >>>>>> >> Cassandra core is included, as names like cassandra-ecosystem or >>>>>> >> cassandra-platform might. >>>>>> >> >>>>>> >> For the JIRA project key, CASSCOMP would be a natural fit. >>>>>> >> >>>>>> >> API Compatibility >>>>>> >> >>>>>> >> Jeremiah raises a valid concern — co-locating the client and server >>>>>> >> removes the repo boundary that previously reminded developers they >>>>>> >> are touching a public API surface. Štefan's versioning model >>>>>> >> addresses the consumer-facing question ("what runs with what") well, >>>>>> >> but we also need developer-facing guardrails to mechanically enforce >>>>>> >> the promise. I'd propose combining three layers: >>>>>> >> >>>>>> >> Versioning contract (Štefan's model): same major.minor guarantees a >>>>>> >> compatible Analytics/Sidecar pair; patch releases of Sidecar are safe >>>>>> >> to advance independently; new REST endpoints require a minor bump >>>>>> >> Unified version and release cadence: all modules release together >>>>>> >> under the same version number. This directly aligns with the merge's >>>>>> >> core motivation of reducing coordination overhead. The alternative, >>>>>> >> independent module versioning within the monorepo, would essentially >>>>>> >> recreate the cross-repo coordination friction we are trying to >>>>>> >> eliminate. Conveniently, Analytics and Sidecar are currently at the >>>>>> >> same version number, so there is no awkward jump or reset needed at >>>>>> >> the point of merge. >>>>>> >> CI enforcement: an OpenAPI contract test that fails if a change >>>>>> >> breaks the API surface relative to the previous release, plus a >>>>>> >> compatibility matrix test that runs the N-1 Analytics client against >>>>>> >> the current Sidecar server >>>>>> >> Stability annotations: adopt @PublicApi / @InternalApi / @Stable / >>>>>> >> @Evolving / @Deprecated annotations on the Sidecar API surface, >>>>>> >> following the pattern established by Kafka and Elasticsearch. This >>>>>> >> makes the contract explicit and discoverable in code — a developer >>>>>> >> touching an annotated method immediately sees its stability guarantee >>>>>> >> and since which version it has been public >>>>>> >> >>>>>> >> The three layers are complementary: the versioning model defines the >>>>>> >> promise, annotations mark the contract in code, and CI enforces the >>>>>> >> promise mechanically. The unified release cadence ensures the promise >>>>>> >> is always evaluated as a whole. >>>>>> >> >>>>>> >> As a side note — Cassandra core currently lacks this kind of API >>>>>> >> stability clarity, which creates real friction for downstream >>>>>> >> projects. Establishing this practice in the companion project gives >>>>>> >> us a concrete, working reference that could motivate and inform a >>>>>> >> broader Cassandra core evolution down the road. Happy to discuss that >>>>>> >> separately if there is interest. >>>>>> >> >>>>>> >> Looking forward to hearing everyone's thoughts. >>>>>> >> >>>>>> >> Thanks >>>>>> >> - Yifan >>>>>> >> >>>>>> >> On Wed, Jun 3, 2026 at 11:32 PM Štefan Miklošovič >>>>>> >> <[email protected]> wrote: >>>>>> >>> >>>>>> >>> Hi Jeremiah, >>>>>> >>> >>>>>> >>> for now, what I find difficult and I found myself questioning this >>>>>> >>> repeatedly is "what version of Sidecar can I run with Analytics?" Is >>>>>> >>> Sidecar 0.2.0 compatible with Analytics 0.4.0? We just don't know >>>>>> >>> until we run it and try. There is no compatibility matrix for what >>>>>> >>> goes with what. If each component is developed independently then I >>>>>> >>> think it will be more messy than if it was released in lock-step. >>>>>> >>> >>>>>> >>> We might establish a policy that e.g. a patch release of Sidecar is >>>>>> >>> compatible with whatever minor in Analytics. For example, we release >>>>>> >>> both Sidecar and Analytics under unified version 1.0.0. Then we will >>>>>> >>> release 1.0.5 of both next. So we can say that Sidecar 1.0.5 is >>>>>> >>> compatible with Analytics 1.0.0. Or Sidecar 1.1.5 is compatible with >>>>>> >>> Analytics 1.1.0. Basically, Sidecar is a standalone server app a user >>>>>> >>> can run without Analytics but once they are interested in Analytics >>>>>> >>> combo, they would need to run with respective Analytics releases. >>>>>> >>> >>>>>> >>> If we release Analytics and Sidecar 1.1.0 and you have Sidecar 1.0.5 >>>>>> >>> then you would need to upgrade to 1.1.0 to be sure that it is >>>>>> >>> compatible with Analytics 100% while you could just bump patch >>>>>> >>> releases for Sidecar endlessly if you are interested in Sidecar >>>>>> >>> without Analytics. >>>>>> >>> >>>>>> >>> This would of course mean that there would need to be awareness in >>>>>> >>> "will this patch I want to ship to Sidecar work in related Analytics >>>>>> >>> minor version when we release it?". We might also say that a new REST >>>>>> >>> endpoint can go only into a new minor version and similar. >>>>>> >>> >>>>>> >>> This was, of course, just an example and it is all tweakable. >>>>>> >>> >>>>>> >>> On Wed, Jun 3, 2026 at 11:44 PM Jeremiah Jordan >>>>>> >>> <[email protected]> wrote: >>>>>> >>>>> >>>>>> >>>>> I worry if we move into the Sidecar repo it's just going to become >>>>>> >>>>> more coupled and folks in the community are already using >>>>>> >>>>> Analytics to read from e.g. S3 buckets or other data sources. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I have similar concerns. If we start releasing them in lockstep >>>>>> >>>> from the same repo, then I worry that people will start making >>>>>> >>>> breaking changes to sidecar APIs such that existing Analytics jars >>>>>> >>>> out in the wild will not work, without realizing it. >>>>>> >>>> >>>>>> >>>> Both cassandra-analytics and the cassandra-sidecar are starting to >>>>>> >>>> be used out in the world by people in production settings. My >>>>>> >>>> expectation for updates to the sidecar APIs is that anything done >>>>>> >>>> should not break existing clients, when the client and the server >>>>>> >>>> are in different repos, it is much cleaner and clearer to people >>>>>> >>>> that you are exposing an API surface which is being consumed >>>>>> >>>> externally, and you need to keep things like backwards >>>>>> >>>> compatibility in mind. If the client and the server live in the >>>>>> >>>> same repo, and are released together, I can see people just >>>>>> >>>> changing/refactoring both and not considering existing clients out >>>>>> >>>> in the wild. I think them being in separate repos makes that >>>>>> >>>> distinction clearer to someone working on a new feature that spans >>>>>> >>>> both code bases. >>>>>> >>>> >>>>>> >>>> Seems like many here want them in the same repo, so I won’t block >>>>>> >>>> that, but I have concerns. >>>>>> >>>> >>>>>> >>>> If we do decide to merge them, I think it should be in a new repo >>>>>> >>>> with a new name. I do not think the sidecar belongs in a repo >>>>>> >>>> names analytics, or the analytics library belongs in a repo named >>>>>> >>>> sidecar. They both have use cases that do not involved the other. >>>>>> >>>> >>>>>> >>>> -Jeremiah Jordan >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On Jun 3, 2026 at 11:42:15 AM, James Berragan <[email protected]> >>>>>> >>>> wrote: >>>>>> >>>>> >>>>>> >>>>> Can we break down a bit more where the circular dependency lies, >>>>>> >>>>> I'm not against it, I just want to make sure we're solving the >>>>>> >>>>> right problem here. Analytics and CDC were always designed to be >>>>>> >>>>> agnostic of the Sidecar. What stops us moving just the Sidecar >>>>>> >>>>> specific parts into the Sidecar repo? I worry if we move into the >>>>>> >>>>> Sidecar repo it's just going to become more coupled and folks in >>>>>> >>>>> the community are already using Analytics to read from e.g. S3 >>>>>> >>>>> buckets or other data sources. >>>>>> >>>>> >>>>>> >>>>> James. >>>>>> >>>>> >>>>>> >>>>> On Tue, 2 Jun 2026 at 13:20, Josh McKenzie <[email protected]> >>>>>> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> I'd like to propose we merge the cassandra-sidecar and >>>>>> >>>>>> cassandra-analytics repositories. I've shopped the idea around to >>>>>> >>>>>> some of you and gotten universally positive feedback with some >>>>>> >>>>>> questions about details we deferred to this discussion. >>>>>> >>>>>> >>>>>> >>>>>> Reasons we should merge: >>>>>> >>>>>> >>>>>> >>>>>> Break circular dependencies between the 2 projects >>>>>> >>>>>> Remove redundant copy/pasted code >>>>>> >>>>>> Simplify build and CI >>>>>> >>>>>> Reduce friction on changes that span both projects >>>>>> >>>>>> Simplify the CDC implementation >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Outstanding questions and observations that came up: >>>>>> >>>>>> >>>>>> >>>>>> Do we merge one repository into the other? Or do we create a new >>>>>> >>>>>> project and bring them both in? >>>>>> >>>>>> What do we do about JIRA? Leave separate or combine? >>>>>> >>>>>> What do we do with open issues and PR's in github? >>>>>> >>>>>> We'll need to thoughtfully update CI (github + circle) since >>>>>> >>>>>> we're right at the limit on the free tier on both projects >>>>>> >>>>>> What do we do about existing deprecated repositories >>>>>> >>>>>> (cassandra-analytics and/or cassandra-sidecar)? >>>>>> >>>>>> We'll need to update our release process >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Other observations or questions welcome, as are thoughts on the >>>>>> >>>>>> entire process, on the outstanding questions, etc. >>>>>> >>>>>> >>>>>> >>>>>> Looking forward to the discussion everyone. >>>>>> >>>>>> >>>>>> >>>>>> ~Josh >>>>>> >>>>>> >>>>>
