[
https://issues.apache.org/jira/browse/CASSANDRA-12148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua McKenzie updated CASSANDRA-12148:
----------------------------------------
Reviewer: Branimir Lambov
Status: Patch Available (was: Open)
Patch available on linked branch with modified tests. Still need to write a
dtest for the replay logic since replaying from within a utest is a chore.
Targeting 3.10 for this so no rush on the review.
||branch||testall||dtest||
|[12148|https://github.com/josh-mckenzie/cassandra/tree/12148]|[testall|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-12148-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-12148-dtest]|
> Improve determinism of CDC data availability
> --------------------------------------------
>
> Key: CASSANDRA-12148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Joshua McKenzie
> Assignee: Joshua McKenzie
>
> The latency with which CDC data becomes available has a known limitation due
> to our reliance on CommitLogSegments being discarded to have the data
> available in cdc_raw: if a slowly written table co-habitates a
> CommitLogSegment with CDC data, the CommitLogSegment won't be flushed until
> we hit either memory pressure on memtables or CommitLog limit pressure.
> Ultimately, this leaves a non-deterministic element to when data becomes
> available for CDC consumption unless a consumer parses live CommitLogSegments.
> To work around this limitation and make semi-realtime CDC consumption more
> friendly to end-users, I propose we extend CDC as follows:
> h6. High level:
> * Consumers parse hard links of active CommitLogSegments in cdc_raw instead
> of waiting for flush/discard and file move
> * C* stores an offset of the highest seen CDC mutation in a separate idx file
> per commit log segment in cdc_raw. Clients tail this index file, delta their
> local last parsed offset on change, and parse the corresponding commit log
> segment using their last parsed offset as min
> * C* flags that index file with an offset and DONE when the file is flushed
> so clients know when they can clean up
> h6. Details:
> * On creation of a CommitLogSegment, also hard-link the file in cdc_raw
> * On first write of a CDC-enabled mutation to a segment, we:
> ** Flag it as {{CDCState.CONTAINS}}
> ** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled
> mutation in the log
> ** Set a long in the CommitLogSegment tracking the offset of the end of the
> last written CDC mutation in the segment if higher than the previously known
> highest CDC offset
> * On subsequent writes to the segment, we update the offset of the highest
> known CDC data
> * On CommitLogSegment fsync, we write a file in cdc_raw as
> <segment_name>_cdc.idx containing the min offset and end offset fsynced to
> disk per file
> * On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the
> segment in commitlog and in cdc_raw
> * On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the
> segment in commitlog and update the <segment_name>_cdc.idx file w/end offset
> and a DONE marker
> * On segment replay, store the highest end offset of seen CDC-enabled
> mutations from a segment and write that to <segment_name>_cdc.idx on
> completion of segment replay. This should bridge the potential correctness
> gap of a node writing to a segment and then dying before it can write the
> <segment_name>_cdc.idx file.
> This should allow clients to skip the beginning of a file to the 1st CDC
> mutation, track an offset of how far they've parsed, delta against the
> _cdc.idx file end offset, and use that as a determinant on when to parse new
> CDC data. Any existing clients written to the initial implementation of CDC
> need only add the <segment_name>_cdc.idx logic and checking for DONE marker
> to their code, so the burden on users to update to support this should be
> quite small for the benefit of having data available as soon as it's fsynced
> instead of at a non-deterministic time when potentially unrelated tables are
> flushed.
> Finally, we should look into extending the interface on CommitLogReader to be
> more friendly for realtime parsing, perhaps supporting taking a
> CommitLogDescriptor and RandomAccessReader and resuming readSection calls,
> assuming the reader is at the start of a SyncSegment. Would probably also
> need to rewind to the start of the segment before returning so subsequent
> calls would respect this contract. This would skip needing to deserialize the
> descriptor and all completed SyncSegments to get to the root of the desired
> segment for parsing.
> One alternative we discussed offline - instead of just storing the highest
> seen CDC offset, we could instead store an offset per CDC mutation
> (potentially delta encoded) in the idx file to allow clients to seek and only
> parse the mutations with CDC enabled. My hunch is that the performance delta
> from doing so wouldn't justify the complexity given the SyncSegment
> deserialization and seeking restrictions in the compressed and encrypted
> cases as mentioned above.
> The only complication I can think of with the above design is uncompressed
> mmapped CommitLogSegments on Windows being undeletable, but it'd be pretty
> simple to disallow configuration of CDC w/uncompressed CommitLog on that
> environment.
> And as a final note: while the above might sound involved, it really
> shouldn't be a big change from where we are with v1 of CDC from a C*
> complexity nor code perspective, or from a client implementation perspective.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)