Stefan Podkowinski created CASSANDRA-15027:
----------------------------------------------
Summary: Handle IR prepare phase failures less race prone by
waiting for all results
Key: CASSANDRA-15027
URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
Project: Cassandra
Issue Type: Bug
Components: Consistency/Repair, Local/Compaction
Reporter: Stefan Podkowinski
Assignee: Stefan Podkowinski
Fix For: 4.x
Handling incremental repairs as a coordinator begins by sending a
{{PrepareConsistentRequest}} message to all participants, which may also
include the coordinator itself. Participants will run anti-compactions upon
receiving such a message and report the result of the operation back to the
coordinator.
Once we receive a failure response from any of the participants, we fail-fast
in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn completes
the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then the repair
command will terminate with an error status, as expected.
The issue is that in case the node will both be coordinator and participant, we
may end up with a local session and submitted anti-compactions, which will be
executed without any coordination with the coordinator session (on same node).
This may result in situations where running repair commands right after
another, may cause overlapping execution of anti-compactions that will cause
the following (misleading) message to show up in the logs and will cause the
repair to fail again:
"Prepare phase for incremental repair session %s has failed because it
encountered intersecting sstables belonging to another incremental repair
session (%s). This is by starting an incremental repair session before a
previous one has completed. Check nodetool repair_admin for hung sessions and
fix them."
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]