[jira] [Created] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

Stefan Podkowinski (JIRA) Sun, 17 Feb 2019 23:50:09 -0800

Stefan Podkowinski created CASSANDRA-15027:
----------------------------------------------


             Summary: Handle IR prepare phase failures less race prone by 
waiting for all results
                 Key: CASSANDRA-15027
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15027
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Repair, Local/Compaction
            Reporter: Stefan Podkowinski
            Assignee: Stefan Podkowinski
             Fix For: 4.x


Handling incremental repairs as a coordinator begins by sending a 
{{PrepareConsistentRequest}} message to all participants, which may also 
include the coordinator itself. Participants will run anti-compactions upon 
receiving such a message and report the result of the operation back to the 
coordinator.

Once we receive a failure response from any of the participants, we fail-fast 
in {{CoordinatorSession.handlePrepareResponse()}}, which will in turn completes 
the {{prepareFuture}} that {{RepairRunnable}} is blocking on. Then the repair 
command will terminate with an error status, as expected.

The issue is that in case the node will both be coordinator and participant, we 
may end up with a local session and submitted anti-compactions, which will be 
executed without any coordination with the coordinator session (on same node). 
This may result in situations where running repair commands right after 
another, may cause overlapping execution of anti-compactions that will cause 
the following (misleading) message to show up in the logs and will cause the 
repair to fail again:
 "Prepare phase for incremental repair session %s has failed because it 
encountered intersecting sstables belonging to another incremental repair 
session (%s). This is by starting an incremental repair session before a 
previous one has completed. Check nodetool repair_admin for hung sessions and 
fix them."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-15027) Handle IR prepare phase failures less race prone by waiting for all results

Reply via email to