[
https://issues.apache.org/jira/browse/CASSANDRA-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paulo Motta reassigned CASSANDRA-21115:
---------------------------------------
Assignee: Paulo Motta
> Auto-repair skips incomplete first repair after node restart due to ordering
> of checks
> --------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21115
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Consistency/Repair
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When a node starts its very first auto-repair and crashes before completing
> it, the repair won't be resumed properly after restart. Instead, it gets
> skipped by the "too soon to repair" check for up to 24 hours.
> *What happens*
> 1. Node joins the cluster, no repair history exists yet
> 2. insertNewRepairHistory() creates a record with both repair_start_ts and
> repair_finish_ts set to the current time (let's call it T1)
> 3. When repair actually starts, only repair_start_ts gets updated to T2
> 4. Node crashes mid-repair
> 5. On restart, tooSoonToRunRepair() is called before myTurnToRunRepair()
> 6. It queries repair_finish_ts which is still T1 (the record creation time,
> not an actual repair completion)
> 7. If less than 24h have passed since T1, the check returns "too soon" and
> bails out
> 8. The logic in myTurnToRunRepair() that detects ongoing repairs
> (repair_start_ts > repair_finish_ts) never gets a chance to run
> *Expected behavior*
> A repair that was in progress should be resumed after restart, regardless
> of the min_repair_interval setting. The "too soon" check should not apply to
> incomplete repairs.
> *How to reproduce*
> 1. Set up a fresh node with auto-repair enabled
> 2. Wait for the first repair to start
> 3. Kill the node before repair completes
> 4. Restart the node within 24 hours
> 5. Observe that repair is skipped with "Too soon to run repair" in the logs
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]