[
https://issues.apache.org/jira/browse/PHOENIX-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lokesh Khurana resolved PHOENIX-7870.
-------------------------------------
Resolution: Resolved
> GetClusterRoleRecordUtil: per-HA-group poller futures + url1/url2 alternation
> -----------------------------------------------------------------------------
>
> Key: PHOENIX-7870
> URL: https://issues.apache.org/jira/browse/PHOENIX-7870
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: Lokesh Khurana
> Assignee: Lokesh Khurana
> Priority: Major
>
> GetClusterRoleRecordUtil has two related bugs in its non-active poller logic.
> Bug 1: Cross-group cancel collision via shared static pollerFuture
> The class declares a single static volatile ScheduledFuture<?> pollerFuture
> field that is overwritten by every call to schedulePoller, regardless of the
> HA group name. The companion
> schedulerMap is correctly keyed by HA group name, but the future itself is
> not. When two HA groups poll concurrently, the second group's schedulePoller
> overwrites pollerFuture with
> its own future. The first group's later cancel-on-active branch then calls
> pollerFuture.cancel(false), cancelling the wrong group's future. The first
> group's poller is left orphaned:
> still running on the scheduler, but no longer tracked, so it can never be
> cancelled cleanly. The affected group's CRR cache stops refreshing and the
> client routes at the last-known
> active even after the operator promotes a new active.
> Bug 2: Poller pins to a single URL with no alternation or peer fallback
> schedulePoller accepts a single url parameter and the polling lambda closes
> over it. Every tick calls getClusterRoleRecord(url, ...) against the same
> URL. There is no alternation
> between url1 and url2, and no fallback on SQLException. If the cluster
> behind the bound URL becomes unreachable after the poller starts, every tick
> throws and the poller never escapes
> — no peer-side check happens, even when the peer cluster is healthy and
> would correctly report the new role.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)