[
https://issues.apache.org/jira/browse/CASSANDRA-9669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701436#comment-14701436
]
Branimir Lambov commented on CASSANDRA-9669:
--------------------------------------------
First round of review, focussing at the replay logic:
{{ReplayPosition.add}} is suspect, as [in extending the
end|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-2b76f3efa4aaa38339ab8f4869c9b7bfR70]
{{extend != null}} implies {{extend.getKey().compareTo(end) >= 0}} and this
makes the branch reachable only if {{end == entend.getKey()}} which I don't
think is what you want.
The other direction appears correct, but the two can't be symmetrical as we can
only key into starts. This is sufficient as {{range\[i\].key/lower <
range\[i\].value/upper < range\[i+1\].key/lower}} (which I think you should
clarify in a comment) and thus the entry we want to extend us on the right is
the one before the first start larger than our end. {{end = max(end,
floorEntry(end).value)}} should do it.
There's no need to recursively rerun, extending in either direction has no
effect on the other side so you can do the two in sequence.
{{ReplayPosition.min}} should be a bit more descriptively named, e.g.
"minUpperBound" or "firstNotCovered".
I'd call
[CommitLogReplayer.replay|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR166]
"shouldReplay". Consider moving the containment logic (the last two lines)
into {{ReplayPosition}} to keep the details of the range format within the
class. I believe the intervals are start-inclusive, end-exclusive, thus the
[final
comparison|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR172]
should be {{<=}}.
Could you update the JavaDoc, especially returned values (e.g.
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-98f5acb96aa6d684781936c141132e2aR2498]
and
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R247])?
The comment
[here|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-348a1347dacf897385fb0a97116a1b5eR333]
is outdated.
Is
[FlushRunnable.runMayThrow|https://github.com/apache/cassandra/compare/trunk...belliottsmith:9669-2.0#diff-f0a15c3588b56c5ce53ece7c48e325b5R352]
still needed? DiskAwareRunnable?
I'll continue looking at rest of the code tomorrow.
> If sstable flushes complete out of order, on restart we can fail to replay
> necessary commit log records
> -------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-9669
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9669
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Priority: Critical
> Labels: correctness
> Fix For: 3.x, 2.1.x, 2.2.x, 3.0.x
>
>
> While {{postFlushExecutor}} ensures it never expires CL entries out-of-order,
> on restart we simply take the maximum replay position of any sstable on disk,
> and ignore anything prior.
> It is quite possible for there to be two flushes triggered for a given table,
> and for the second to finish first by virtue of containing a much smaller
> quantity of live data (or perhaps the disk is just under less pressure). If
> we crash before the first sstable has been written, then on restart the data
> it would have represented will disappear, since we will not replay the CL
> records.
> This looks to be a bug present since time immemorial, and also seems pretty
> serious.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)