[
https://issues.apache.org/jira/browse/CASSANDRA-15553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040506#comment-17040506
]
David Capwell commented on CASSANDRA-15553:
-------------------------------------------
Took a look and had to look closer at IR messaging, what I see is the following
IR messaging is fire-and-forget pattern, so any ephemeral issues lead to
messages not being seen (tests show this CASSANDRA-15564 and have been reported
as issues with current repair CASSANDRA-15566). This patch relies on the
FINALIZE_COMMIT_MSG being seen on the coordinator of the IR preview repair in
order to detect conflict, but the message is seen asynchronously so may see
this on the participants while validation is running and seen on the
coordinator after all validations have been seen on the coordinator (so session
is already complete); in this case you have the same issue as reported by this
JIRA.
This patch also affectively blocks preview and IR running for the same range as
the preview will fail with conflict*, so IR should stop scheduling if preview
is running, and preview should not be scheduled while IR is running (else we
waste the resources on validation); effectively what ever is scheduling the
repairs will have to be enhanced to handle this which adds more complexity to
operators.
I actually wonder if we can remove this restriction. What it looks like to me
is that repairedAt is system time (aka, could have drift, could roll backwards,
etc.), but we could keep track of largest one and make sure this counter is
monotonic. With a data structure of
* largest contiguous commit (long)
* inFlight (array of long)
We could make sure that we (coordinator) always produce a repairedAt larger
than any we know of, and this lets preview take a snapshot of the state at the
start of coordination. With this snapshot, we filter for repaired and
repairedAt <= largest contiguous commit snapshot; this should give preview
repair effectively snapshot isolation (assuming compaction also maintains
repairedAt).
* In CASSANDRA-15564 I show that preview doesn't properly check session
failures, run [this
test|https://github.com/apache/cassandra/pull/446/files#diff-af4a07a2b44695f510dddb0c102e1953R28]
and [this
one|https://github.com/apache/cassandra/pull/446/files#diff-ca9f3b43ad8ff955d6ddd2ef4d2b6904R28]
without the change in the JIRA to see it. The reason your tests are different
is because you don't use nodetool and directly monitor notifications.
> Preview repair should include sstables from finalized incremental repair
> sessions
> ---------------------------------------------------------------------------------
>
> Key: CASSANDRA-15553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15553
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Repair
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Priority: Normal
> Fix For: 4.0-alpha
>
>
> When running a preview repair we currently grab all repaired sstables,
> problem is that we depend on compaction to move the sstables from pending to
> repaired so we might have different data marked repaired on different nodes.
> Including any sstables from finalized incremental repair sessions as repaired
> will solve this.
> Another problem is that validations don't start at exactly the same time on
> different nodes, so if an incremental repair finishes while the preview
> repair is running we might also validate the wrong repaired set. We should
> fail the preview repair if an intersecting incremental repair finishes during
> the preview repair.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]