As I was reviewing this, it occurred to me that it was talking about Sidecar like it was a thing but that CEP has been stalled for quite some time: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224
If work on this is being done, should we get this official and wrapped up? On to the proposal... This has been a topic on the project for over 10 years now. I've seen multiple goes at making this work and the issue that always turns out to torpedo the project is handing dupes. To the point that they go from a generalized Kafka producer engine to something specific to a particular use case. I don't see much on how this would be handled other than "left to the end user to figure out." There is also little mention of where the increased resource load would be handled. This has been discussed many times before, but is it time to introduce the concept of an elected leader for a token range for this type of operation? It would eliminate a ton of problems that need to managed when bridging c* to a system like Kafka. Last time it was discussed in earnest was for KIP-30: https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems Patrick On Sat, Sep 28, 2024 at 11:44 AM Jon Haddad <j...@rustyrazorblade.com> wrote: > Yes! I’m really looking forward to trying this out. The CEP looks really > well thought out. I think this will make CDC a lot more useful for a lot of > teams. > > Jon > > > On Fri, Sep 27, 2024 at 4:23 PM Josh McKenzie <jmcken...@apache.org> > wrote: > >> Really excited to see this hit the ML James. >> >> As author of the base CDC (get your stones ready for throwing :D) and >> someone moderately involved in the CEP here, definitely welcome any >> questions. CDC is a *thorny* *problem *in a multi-replica distributed >> system like this. >> >> On Fri, Sep 27, 2024, at 5:40 PM, James Berragan wrote: >> >> Hi everyone, >> >> Wiki: >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-44%3A+Kafka+integration+for+Cassandra+CDC+using+Sidecar >> >> We would like to propose this CEP for adoption by the community. >> >> CDC is a common technique in databases but right now there is no >> out-of-the-box solution to do this easily and at scale with Cassandra. Our >> proposal is to build a fully-fledged solution into the Apache Cassandra >> Sidecar. This comes with a number of benefits: >> - Sidecar is an official part of the existing Cassandra eco-system. >> - Sidecar runs co-located with Cassandra instances and so scales with the >> cluster size. >> - Sidecar can access the underlying Cassandra database to store CDC >> configuration and the CDC state in a special table. >> - Running in the Sidecar does not require additional external resources >> to run. >> >> The core CDC module we anticipate will be pluggable and re-usable, it is >> available for review here: >> https://github.com/apache/cassandra-analytics/pull/87. The remaining >> Sidecar code will follow. >> >> As a reminder, please keep the discussion here on the dev list vs. in the >> wiki, as we’ve found it easier to manage via email. >> >> Sincerely, >> James Berragan >> Bernardo Botella Corbi >> Yifan Cai >> Jyothsna Konisa >> >> >>