My understand is that all CDC really is now is a stable commit log reader.
For a given mutation on an RF=3 system, you'll end up with 3 readers that
all *could* do some action. For now let's just say "put it it in a Kafka
topic" because that lets us do anything we want after that.
I suppose the
Jon,
You know I've not actually spent the hour to read the ticket so I was just
guessing it didn't handle dedup...all the same semantics apply though..you'd
have to do a read before write and then allow some window of failure mode.
Maybe if you were LWT everything but that sounds really
I'm having a hard time seeing how anyone would be able to work with CDC in
it's currently implementation of not doing any dedupe. Unless you really
want to write all your own logic for that including failure handling + a
distributed state machine I wouldn't count on it as a solution.
On Tue, Aug
You can follow the monster of a ticket
https://issues.apache.org/jira/browse/CASSANDRA-8844 and see if it looks like
the tradeoffs there are headed in the right direction for you.
even CDC I think would have the logically same issue of not deduping for you as
triggers and dual write due to
Thanks Ryan. I was hoping there was a change data capture framework. We
have late arriving events, some of which can be very late. We would have
to batch collect data for a large time period every so often to go back and
collect those or accept that we are going to lose a small percentage of
The typical pattern I've seen in the field is kafka + consumers for each
destination (variant of dual write I know), this of course would not work for
your goal of relying on C* for dedup. Triggers would also suffer the same
problem unfortunately so you're really left with a batch job (most