Joshua McKenzie created CASSANDRA-12148:
-------------------------------------------
Summary: Improve determinism of CDC data availability
Key: CASSANDRA-12148
URL: https://issues.apache.org/jira/browse/CASSANDRA-12148
Project: Cassandra
Issue Type: Improvement
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
The latency with which CDC data becomes available has a known limitation due to
our reliance on CommitLogSegments being discarded to have the data available in
cdc_raw: if a slowly written table co-habitates a CommitLogSegment with CDC
data, the CommitLogSegment won't be flushed until we hit either memory pressure
on memtables or CommitLog limit pressure. Ultimately, this leaves a
non-deterministic element to when data becomes available for CDC consumption
unless a consumer parses live CommitLogSegments.
To work around this limitation and make semi-realtime CDC consumption more
friendly to end-users, I propose we extend CDC as follows:
h6. High level:
* Consumers parse hard links of active CommitLogSegments in cdc_raw instead of
waiting for flush/discard and file move
* C* stores an offset of the highest seen CDC mutation in a separate idx file
per commit log segment in cdc_raw. Clients tail this index file, delta their
local last parsed offset on change, and parse the corresponding commit log
segment using their last parsed offset as min
* C* flags that index file with an offset and DONE when the file is flushed so
clients know when they can clean up
h6. Details:
* On creation of a CommitLogSegment, also hard-link the file in cdc_raw
* On first write of a CDC-enabled mutation to a segment, we:
** Flag it as {{CDCState.CONTAINS}}
** Set a long tracking the {{CommitLogPosition}} of the 1st CDC-enabled
mutation in the log
** Set a long in the CommitLogSegment tracking the offset of the end of the
last written CDC mutation in the segment if higher than the previously known
highest CDC offset
* On subsequent writes to the segment, we update the offset of the highest
known CDC data
* On CommitLogSegment fsync, we write a file in cdc_raw as
<segment_name>_cdc.idx containing the min offset and end offset fsynced to disk
per file
* On segment discard, if CDCState == {{CDCState.PERMITTED}}, delete both the
segment in commitlog and in cdc_raw
* On segment discard, if CDCState == {{CDCState.CONTAINS}}, delete the segment
in commitlog and update the <segment_name>_cdc.idx file w/end offset and a DONE
marker
* On segment replay, store the highest end offset of seen CDC-enabled mutations
from a segment and write that to <segment_name>_cdc.idx on completion of
segment replay. This should bridge the potential correctness gap of a node
writing to a segment and then dying before it can write the
<segment_name>_cdc.idx file.
This should allow clients to skip the beginning of a file to the 1st CDC
mutation, track an offset of how far they've parsed, delta against the _cdc.idx
file end offset, and use that as a determinant on when to parse new CDC data.
Any existing clients written to the initial implementation of CDC need only add
the <segment_name>_cdc.idx logic and checking for DONE marker to their code, so
the burden on users to update to support this should be quite small for the
benefit of having data available as soon as it's fsynced instead of at a
non-deterministic time when potentially unrelated tables are flushed.
Finally, we should look into extending the interface on CommitLogReader to be
more friendly for realtime parsing, perhaps supporting taking a
CommitLogDescriptor and RandomAccessReader and resuming readSection calls,
assuming the reader is at the start of a SyncSegment. Would probably also need
to rewind to the start of the segment before returning so subsequent calls
would respect this contract. This would skip needing to deserialize the
descriptor and all completed SyncSegments to get to the root of the desired
segment for parsing.
One alternative we discussed offline - instead of just storing the highest seen
CDC offset, we could instead store an offset per CDC mutation (potentially
delta encoded) in the idx file to allow clients to seek and only parse the
mutations with CDC enabled. My hunch is that the performance delta from doing
so wouldn't justify the complexity given the SyncSegment deserialization and
seeking restrictions in the compressed and encrypted cases as mentioned above.
The only complication I can think of with the above design is uncompressed
mmapped CommitLogSegments on Windows being undeletable, but it'd be pretty
simple to disallow configuration of CDC w/uncompressed CommitLog on that
environment.
And as a final note: while the above might sound involved, it really shouldn't
be a big change from where we are with v1 of CDC from a C* complexity nor code
perspective, or from a client implementation perspective.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)