Here's a deck of some proposed additions, discussed at one of the NGCC sessions last fall:
https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf On Tue, Jan 30, 2018 at 5:10 PM, Andrew Prudhomme <a...@yelp.com> wrote: > Hi all, > > We are currently designing a system that allows our Cassandra clusters to > produce a stream of data updates. Naturally, we have been evaluating if CDC > can aid in this endeavor. We have found several challenges in using CDC for > this purpose. > > CDC provides only the mutation as opposed to the full column value, which > tends to be of limited use for us. Applications might want to know the full > column value, without having to issue a read back. We also see value in > being able to publish the full column value both before and after the > update. This is especially true when deleting a column since this stream > may be joined with others, or consumers may require other fields to > properly process the delete. > > Additionally, there is some difficulty with processing CDC itself such as: > - Updates not being immediately available (addressed by CASSANDRA-12148) > - Each node providing an independent streams of updates that must be > unified and deduplicated > > Our question is, what is the vision for CDC development? The current > implementation could work for some use cases, but is a ways from a general > streaming solution. I understand that the nature of Cassandra makes this > quite complicated, but are there any thoughts or desires on the future > direction of CDC? > > Thanks > >