Transactional metadata and Accord should make it MUCH easier to do duplication avoiding CDC (and I was going to note that someone should ensure that the interfaces exposed to the public are stable enough not to change the published api once those exist)
On Sep 29, 2024, at 7:04 PM, Patrick McFadin <pmcfa...@gmail.com> wrote:
As I was reviewing this, it occurred to me that it was talking about Sidecar like it was a thing but that CEP has been stalled for quite some time: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224
If work on this is being done, should we get this official and wrapped up?
On to the proposal...
This has been a topic on the project for over 10 years now. I've seen multiple goes at making this work and the issue that always turns out to torpedo the project is handing dupes. To the point that they go from a generalized Kafka producer engine to something specific to a particular use case. I don't see much on how this would be handled other than "left to the end user to figure out."
There is also little mention of where the increased resource load would be handled.
Patrick Yes! I’m really looking forward to trying this out. The CEP looks really well thought out. I think this will make CDC a lot more useful for a lot of teams.
Jon
Really excited to see this hit the ML James.
As author of the base CDC (get your stones ready for throwing :D) and someone moderately involved in the CEP here, definitely welcome any questions. CDC is a thorny problem in a multi-replica distributed system like this.
On Fri, Sep 27, 2024, at 5:40 PM, James Berragan wrote:
Hi everyone,
We would like to propose this CEP for adoption by the community.
CDC is a common technique in databases but right now there is no out-of-the-box solution to do this easily and at scale with Cassandra. Our proposal is to build a fully-fledged solution into the Apache Cassandra Sidecar. This comes with a number of benefits:
- Sidecar is an official part of the existing Cassandra eco-system.
- Sidecar runs co-located with Cassandra instances and so scales with the cluster size.
- Sidecar can access the underlying Cassandra database to store CDC configuration and the CDC state in a special table.
- Running in the Sidecar does not require additional external resources to run.
As a reminder, please keep the discussion here on the dev list vs. in the wiki, as we’ve found it easier to manage via email.
Sincerely,
James Berragan
Bernardo Botella Corbi
Yifan Cai
Jyothsna Konisa
|