[
https://issues.apache.org/jira/browse/PHOENIX-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Gwalani updated PHOENIX-7802:
--------------------------------------
Description:
*Issue*
If getFirstRoundToProcess() returns empty (degradation < ~69 sec), control
skips the SYNCED_RECOVERY and goes directly to shouldTriggerFailover().
{*}Note{*}: Existing safeguards (new files check, timing constraints) prevent
data loss in production, but this fix protects against future edge cases.
*Solution*
Add replicationReplayState check to shouldTriggerFailover() in
ReplicationLogDiscoveryReplay.java as a defense-in-depth improvement.
was:
The replay service can get stuck in an infinite loop if there is a persistent
issue while processing older files in the in-progress directory.
{code:java}
files = replicationLogTracker.getOlderInProgressFiles(oldestTimestampToProcess);
while (!files.isEmpty()) {
processOneRandomFile(files);
files =
replicationLogTracker.getOlderInProgressFiles(oldestTimestampToProcess);
} {code}
> Add replayState check to shouldTriggerFailover() for defense-in-depth
> ---------------------------------------------------------------------
>
> Key: PHOENIX-7802
> URL: https://issues.apache.org/jira/browse/PHOENIX-7802
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Himanshu Gwalani
> Assignee: Himanshu Gwalani
> Priority: Major
>
> *Issue*
> If getFirstRoundToProcess() returns empty (degradation < ~69 sec), control
> skips the SYNCED_RECOVERY and goes directly to shouldTriggerFailover().
> {*}Note{*}: Existing safeguards (new files check, timing constraints) prevent
> data loss in production, but this fix protects against future edge cases.
> *Solution*
> Add replicationReplayState check to shouldTriggerFailover() in
> ReplicationLogDiscoveryReplay.java as a defense-in-depth improvement.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)