[
https://issues.apache.org/jira/browse/CASSANDRA-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paulo Motta updated CASSANDRA-21115:
------------------------------------
Fix Version/s: 5.1
Since Version: 5.1
Source Control Link:
https://github.com/apache/cassandra/commit/afa55123c87be7fd31a68abd87c3427141fe60c0
Resolution: Fixed
Status: Resolved (was: Ready to Commit)
PR approved, thanks [[email protected]]
> Initial auto-repairs can be skipped by too soon check
> -----------------------------------------------------
>
> Key: CASSANDRA-21115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21115
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Consistency/Repair
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Normal
> Fix For: 5.1
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> *Problem*
> When a repair history record is created, both repair_start_ts and
> repair_finish_ts are initialized to the same timestamp. tooSoonToRunRepair()
> reads repair_finish_ts and if it falls within min_repair_interval,
> immediately returns "too soon" and aborts. This prevents myTurnToRunRepair()
> from executing entirely, skipping both the turn-to-run check and the
> incomplete repair detection.
> *When this occurs*
> 1. Cross-node initialization: Node A calls insertNewRepairHistory() and
> creates a history record for Node B with start_ts = finish_ts = now(). When
> Node B attempts repair, it sees this fresh timestamp and incorrectly skips,
> thinking it just completed a repair.
> 2. First repair interruption: Node starts its first repair (updating
> start_ts) but crashes or fails before completion (finish_ts unchanged). After
> restart, tooSoonToRunRepair() sees the initialization timestamp in finish_ts
> and may skip the incomplete repair.
> *Fix*
> Add a check in tooSoonToRunRepair(): if repair_start_ts >= repair_finish_ts,
> the repair is either unstarted or incomplete. Return false immediately to
> allow it to proceed, bypassing the interval check.
> *Impact*
> Nodes skip their initial repair attempts and wait unnecessarily until
> min_repair_interval elapses from record creation, delaying the first repair
> cycle and allowing data inconsistencies to accumulate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]