[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877154#comment-14877154
 ] 

Benedict commented on CASSANDRA-9669:
-------------------------------------

I've addressed your nits, but giving the patch another review, I realise that 
the approach I've taken with the new parent compaction strategy isn't going to 
cut it in 2.1. This isn't the complete set of operations that can mark things 
compacting, and if we e.g. redistribute summaries (or attempt an "all sstable" 
operation) we may screw up the state badly. 

In 2.1, we can instead safely mark compacting before we make the sstable 
available in the live set, however this too has problems: any "all sstable" 
operation will fail if flushes have gotten far behind, or secondary index 
flushes are taking too long. So we will probably have to introduce a new of 
sstables into the DataTracker.View that tracks these _almost complete_ 
sstables, and filters them from any "all sstable" operation. This is a really 
rather ugly edge case to behaviour, and in 3.X I'll see if there's anything we 
can do to make it more apparent to consumers of the API.

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to