Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

Patrick McFadin Sun, 29 Sep 2024 19:05:50 -0700

As I was reviewing this, it occurred to me that it was talking about
Sidecar like it was a thing but that CEP has been stalled for quite some
time:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224

If work on this is being done, should we get this official and wrapped up?

On to the proposal...

This has been a topic on the project for over 10 years now. I've seen
multiple goes at making this work and the issue that always turns out to
torpedo the project is handing dupes. To the point that they go from a
generalized Kafka producer engine to something specific to a particular use
case. I don't see much on how this would be handled other than "left to the
end user to figure out."

There is also little mention of where the increased resource load would be
handled.

This has been discussed many times before, but is it time to introduce the
concept of an elected leader for a token range for this type of operation?
It would eliminate a ton of problems that need to managed when bridging c*
to a system like Kafka. Last time it was discussed in earnest was for
KIP-30:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems

Patrick

On Sat, Sep 28, 2024 at 11:44 AM Jon Haddad <[email protected]> wrote:

> Yes! I’m really looking forward to trying this out. The CEP looks really
> well thought out. I think this will make CDC a lot more useful for a lot of
> teams.
>
> Jon
>
>
> On Fri, Sep 27, 2024 at 4:23 PM Josh McKenzie <[email protected]>
> wrote:
>
>> Really excited to see this hit the ML James.
>>
>> As author of the base CDC (get your stones ready for throwing :D) and
>> someone moderately involved in the CEP here, definitely welcome any
>> questions. CDC is a *thorny* *problem *in a multi-replica distributed
>> system like this.
>>
>> On Fri, Sep 27, 2024, at 5:40 PM, James Berragan wrote:
>>
>> Hi everyone,
>>
>> Wiki:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-44%3A+Kafka+integration+for+Cassandra+CDC+using+Sidecar
>>
>> We would like to propose this CEP for adoption by the community.
>>
>> CDC is a common technique in databases but right now there is no
>> out-of-the-box solution to do this easily and at scale with Cassandra. Our
>> proposal is to build a fully-fledged solution into the Apache Cassandra
>> Sidecar. This comes with a number of benefits:
>> - Sidecar is an official part of the existing Cassandra eco-system.
>> - Sidecar runs co-located with Cassandra instances and so scales with the
>> cluster size.
>> - Sidecar can access the underlying Cassandra database to store CDC
>> configuration and the CDC state in a special table.
>> - Running in the Sidecar does not require additional external resources
>> to run.
>>
>> The core CDC module we anticipate will be pluggable and re-usable, it is
>> available for review here:
>> https://github.com/apache/cassandra-analytics/pull/87. The remaining
>> Sidecar code will follow.
>>
>> As a reminder, please keep the discussion here on the dev list vs. in the
>> wiki, as we’ve found it easier to manage via email.
>>
>> Sincerely,
>> James Berragan
>> Bernardo Botella Corbi
>> Yifan Cai
>> Jyothsna Konisa
>>
>>
>>

Re: [DISCUSS] CEP-44: Kafka integration for Cassandra CDC using Sidecar

Reply via email to