[ 
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701436#comment-14701436
 ] 

Branimir Lambov commented on CASSANDRA-9669:
--------------------------------------------

First round of review, focussing at the replay logic:

{{ReplayPosition.add}} is suspect, as [in extending the 
end|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-2b76f3efa4aaa38339ab8f4869c9b7bfR70]
 {{extend != null}} implies {{extend.getKey().compareTo(end) >= 0}} and this 
makes the branch reachable only if {{end == entend.getKey()}} which I don't 
think is what you want.

The other direction appears correct, but the two can't be symmetrical as we can 
only key into starts. This is sufficient as {{range\[i\].key/lower < 
range\[i\].value/upper < range\[i+1\].key/lower}} (which I think you should 
clarify in a comment) and thus the entry we want to extend us on the right is 
the one before the first start larger than our end. {{end = max(end, 
floorEntry(end).value)}} should do it.

There's no need to recursively rerun, extending in either direction has no 
effect on the other side so you can do the two in sequence.

{{ReplayPosition.min}} should be a bit more descriptively named, e.g. 
"minUpperBound" or "firstNotCovered".

I'd call 
[CommitLogReplayer.replay|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR166]
 "shouldReplay". Consider moving the containment logic (the last two lines) 
into {{ReplayPosition}} to keep the details of the range format within the 
class. I believe the intervals are start-inclusive, end-exclusive, thus the 
[final 
comparison|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR172]
 should be {{<=}}.

Could you update the JavaDoc, especially returned values (e.g. 
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-98f5acb96aa6d684781936c141132e2aR2498]
 and 
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R247])?
 The comment 
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR333]
 is outdated.

Is 
[FlushRunnable.runMayThrow|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R352]
 still needed? DiskAwareRunnable?

I'll continue looking at rest of the code tomorrow.

> If sstable flushes complete out of order, on restart we can fail to replay 
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Critical
>              Labels: correctness
>             Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order, 
> on restart we simply take the maximum replay position of any sstable on disk, 
> and ignore anything prior. 
> It is quite possible for there to be two flushes triggered for a given table, 
> and for the second to finish first by virtue of containing a much smaller 
> quantity of live data (or perhaps the disk is just under less pressure). If 
> we crash before the first sstable has been written, then on restart the data 
> it would have represented will disappear, since we will not replay the CL 
> records.
> This looks to be a bug present since time immemorial, and also seems pretty 
> serious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to