Hey Jay,

Thanks for the pointer.
I have spent quite some time in trying to understand this, even went
through a good deal of
https://github.com/apache/cassandra/commit/e31e216234c6b57a531cae607e0355666007deb2,
but I am not able to understand how this whole thing works.


*Can someone please correct my understanding till now (stated below)?*

1) Cassandra would only write to the CDC log, and never delete from it.
2) Cleaning up consumed logfiles would be the client daemon's responibility.
3) Daemons should be able to checkpoint their work, and resume from where
they left off.
   This means they would have to leave some file artifact in the CDC log's
directory.
4) Upon flush, CommitLogSegments containing data for CDC-enabled tables are
moved to the data/cdc_raw directory until removed by the user


*Questions:*

1) What is exactly written to the commit log? Is it just the id or the
whole of the object?
2) If its just the IDs of the inserted/modified row, then is the client
expected to read the whole object from the ID?
3) If its the entire payload, how does the client deserialize the payload
to the the full row?
4) What about partial updates? Some clients cannot work on partial updates
and will need to read the full object. Any recommendations for those?
5) What is the best way to try out the whole flow? Is it the following:
 - a) Setup cassandra.yaml for cdc and create  tables with cdc=true
 - b) Write some data to the table and see the files being generated in the
cdc_raw_directory
 - c) Launch an agent similar to CASSANDRA-11575. Consume and delete the
cdc files?

Thanks for your help,
SG



On Wed, Feb 15, 2017 at 3:19 PM, Jay Zhuang <jay.zhu...@yahoo.com.invalid>
wrote:

> I tried this CASSANDRA-11575 for 3.8. Works great.
>
> Thanks,
> Jay
>
>
> On 2/15/17 3:08 PM, S G wrote:
>
>> Hi,
>>
>> I have gone through several resources mentioned in
>> http://cassandra.apache.org/doc/latest/operating/cdc.html
>>
>> The only thing mentioned about reading the CDC is that it is fairly
>> straightforward with a link to
>> https://github.com/apache/cassandra/blob/e31e216234c6b57a531
>> cae607e0355666007deb2/src/java/org/apache/cassandra/db/
>> commitlog/CommitLogReplayer.java#L132-L140
>>
>> This is way too high level.
>>
>> Can someone please explain or provide me the code to read CDC data after
>> enabling this feature in Cassandra?
>>
>>
>> Thanks
>>
>> SG
>>
>>

Reply via email to