[ 
https://issues.apache.org/jira/browse/CASSANDRA-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-21115:
------------------------------------
          Fix Version/s: 5.1
          Since Version: 5.1
    Source Control Link: 
https://github.com/apache/cassandra/commit/afa55123c87be7fd31a68abd87c3427141fe60c0
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

PR approved, thanks [[email protected]] 

> Initial auto-repairs can be skipped by too soon check
> -----------------------------------------------------
>
>                 Key: CASSANDRA-21115
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21115
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Normal
>             Fix For: 5.1
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *Problem*
> When a repair history record is created, both repair_start_ts and 
> repair_finish_ts are initialized to the same timestamp. tooSoonToRunRepair() 
> reads repair_finish_ts and if it falls within min_repair_interval, 
> immediately returns "too soon" and aborts. This prevents myTurnToRunRepair() 
> from executing entirely, skipping both the turn-to-run check and the 
> incomplete repair detection.
> *When this occurs*
>   1. Cross-node initialization: Node A calls insertNewRepairHistory() and 
> creates a history record for Node B with start_ts = finish_ts = now(). When 
> Node B attempts repair, it sees this fresh timestamp and incorrectly skips, 
> thinking it just completed a repair.
>   2. First repair interruption: Node starts its first repair (updating 
> start_ts) but crashes or fails before completion (finish_ts unchanged). After 
> restart, tooSoonToRunRepair() sees the initialization timestamp in finish_ts 
> and may skip the incomplete repair.
> *Fix*
> Add a check in tooSoonToRunRepair(): if repair_start_ts >= repair_finish_ts, 
> the repair is either unstarted or incomplete. Return false immediately to 
> allow it to proceed, bypassing the interval check.
> *Impact*
> Nodes skip their initial repair attempts and wait unnecessarily until 
> min_repair_interval elapses from record creation, delaying the first repair 
> cycle and allowing data inconsistencies to accumulate.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to