[
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631186#comment-14631186
]
Benedict commented on CASSANDRA-9669:
-------------------------------------
So, I am liking this approach less and less. It may be the least effort, but it
has too many sharp edges, in critical portions of the system. It's also
literally a custom endeavour for 2.0, 2.1, 2.2 _and_ 3.0.
I think I will introduce a new commit log expiration ledger, and just write to
it whenever we perform a {{discardCompletedSegments()}} call. This is then
replayed prior to CL replay, to build the state of what records we consider
replayable. Initially, I will limit this to a simple statement of "latest
replayposition we can be certain to have replayed to" since this is a uniform
behaviour for 2.0+. 2.1+ easily supports ranges, which can be implemented when
we deliver CASSANDRA-8496.
> Commit Log Replay is Broken
> ---------------------------
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Priority: Critical
> Labels: correctness
> Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order,
> on restart we simply take the maximum replay position of any sstable on disk,
> and ignore anything prior.
> It is quite possible for there to be two flushes triggered for a given table,
> and for the second to finish first by virtue of containing a much smaller
> quantity of live data (or perhaps the disk is just under less pressure). If
> we crash before the first sstable has been written, then on restart the data
> it would have represented will disappear, since we will not replay the CL
> records.
> This looks to be a bug present since time immemorial, and also seems pretty
> serious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)