[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703619#comment-14703619
 ] 

Benedict commented on CASSANDRA-9669:
-------------------------------------

I've updated the patch to include a unit test, and to fix two more problems. 
One typo, and the {{shouldReplay}} logic was still incorrect. Whenever I 
interface with ReplayPosition I have a strong urge to rewrite it. It is 
terribly counterintuitive, but I guess that's what we have unit tests for.

Long story short, the ranges are inclusive-start, exclusive-end, the inverse of 
what you expected. This confusion stems from the fact we use points to 
represent ranges (i.e. commit log entries) and points that demarcate ranges 
(those that have been persisted), which doesn't really make sense. But during 
replay, and on a write, the {{ReplayPosition}} for a record is the position in 
the segment _directly proceeding_ its serialization location, i.e. it is the 
exclusive upper bound of its bytes. It is, in effect, represented by the one 
number that falls outside of its on disk representation.

Anyway, it's good for a proper review now. I accidentally collapsed the most 
recent follow up commit with the one I uploaded this morning, but mostly this 
was just the unit test, plus those two items (one removed {{!}}, and the 
inverted bounds checking)

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to