[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

Sylvain Lebresne (JIRA) Fri, 15 Jan 2016 03:02:12 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101631#comment-15101631
 ]


Sylvain Lebresne commented on CASSANDRA-9669:
---------------------------------------------

It's possible I'm missing something here, but say we commit this to 3.4, we can 
still bump the sstable version to {{"mb"}}, writting the new lower bound replay 
position at the end of the metadata and 3.4 would still _always_ expect that 
lower bound to be present when reading {{"mb"}} sstables. But if someone 
decides to use a {{"mb"}} sstable with 3.3, this will still work because 1) to 
the best of my knowledge, 3.3 doesn't reject (sstable) version in the future 
(and it's the same "major" sstable version so streaming and such are not 
impacted) and 2) it'll just ignore the additional data it doesn't know at the 
end of the metadata file.

The only ugliness I see is that in the sstable metadata the lower and upper 
bound are not serialized close from each other, but that's a very minor and 
very localized (in the code) ugliness and can be easily fixed later in 4.0.


> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 2.2.x, 3.0.x, 3.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

Reply via email to