[jira] [Updated] (PHOENIX-7802) Add replayState check to shouldTriggerFailover() for defense-in-depth

Himanshu Gwalani (Jira) Fri, 17 Apr 2026 06:33:09 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Himanshu Gwalani updated PHOENIX-7802:
--------------------------------------
    Description: 
*Issue*
If getFirstRoundToProcess() returns empty (degradation < ~69 sec), control 
skips the SYNCED_RECOVERY and goes directly to shouldTriggerFailover().

{*}Note{*}: Existing safeguards (new files check, timing constraints) prevent 
data loss in production, but this fix protects against future edge cases.

*Solution*
Add replicationReplayState check to shouldTriggerFailover() in 
ReplicationLogDiscoveryReplay.java as a defense-in-depth improvement.

  was:
The replay service can get stuck in an infinite loop if there is a persistent 
issue while processing older files in the in-progress directory. 


{code:java}
files = replicationLogTracker.getOlderInProgressFiles(oldestTimestampToProcess);
while (!files.isEmpty()) {
  processOneRandomFile(files);
  files =   
replicationLogTracker.getOlderInProgressFiles(oldestTimestampToProcess);
} {code}


> Add replayState check to shouldTriggerFailover() for defense-in-depth
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-7802
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7802
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>
> *Issue*
> If getFirstRoundToProcess() returns empty (degradation < ~69 sec), control 
> skips the SYNCED_RECOVERY and goes directly to shouldTriggerFailover().
> {*}Note{*}: Existing safeguards (new files check, timing constraints) prevent 
> data loss in production, but this fix protects against future edge cases.
> *Solution*
> Add replicationReplayState check to shouldTriggerFailover() in 
> ReplicationLogDiscoveryReplay.java as a defense-in-depth improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7802) Add replayState check to shouldTriggerFailover() for defense-in-depth

Reply via email to