[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

Benedict (JIRA) Tue, 18 Aug 2015 10:56:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701694#comment-14701694
 ]


Benedict commented on CASSANDRA-9669:
-------------------------------------

Thanks. Looking at the patch, it's clear I wrote it in a bit of a rush, as 
there's another problem with it: it doesn't actually delay compaction with 
STCS. I will need to make STCS rely on the notifyAdded, much as LCS does, or 
introduce a new collection in {{DataTracker}} (this latter being less invasive, 
but uglier)

I will look to introduce at least a basic unit test for this problem, as well 
as fix these problems, tomorrow. However the main thrust of the patch is still 
good, so it's up to you if you want to abort review in lieu of this.

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9669) If sstable flushes complete out of order, on restart we can fail to replay necessary commit log records

Reply via email to