[ 
https://issues.apache.org/jira/browse/PHOENIX-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Khurana resolved PHOENIX-7870.
-------------------------------------
    Resolution: Resolved

> GetClusterRoleRecordUtil: per-HA-group poller futures + url1/url2 alternation
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-7870
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7870
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Lokesh Khurana
>            Assignee: Lokesh Khurana
>            Priority: Major
>
> GetClusterRoleRecordUtil has two related bugs in its non-active poller logic.
>   Bug 1: Cross-group cancel collision via shared static pollerFuture
>   The class declares a single static volatile ScheduledFuture<?> pollerFuture 
> field that is overwritten by every call to schedulePoller, regardless of the 
> HA group name. The companion
>   schedulerMap is correctly keyed by HA group name, but the future itself is 
> not. When two HA groups poll concurrently, the second group's schedulePoller 
> overwrites pollerFuture with
>   its own future. The first group's later cancel-on-active branch then calls 
> pollerFuture.cancel(false), cancelling the wrong group's future. The first 
> group's poller is left orphaned:
>   still running on the scheduler, but no longer tracked, so it can never be 
> cancelled cleanly. The affected group's CRR cache stops refreshing and the 
> client routes at the last-known
>   active even after the operator promotes a new active.
>   Bug 2: Poller pins to a single URL with no alternation or peer fallback
>   schedulePoller accepts a single url parameter and the polling lambda closes 
> over it. Every tick calls getClusterRoleRecord(url, ...) against the same 
> URL. There is no alternation
>   between url1 and url2, and no fallback on SQLException. If the cluster 
> behind the bound URL becomes unreachable after the poller starts, every tick 
> throws and the poller never escapes
>    — no peer-side check happens, even when the peer cluster is healthy and 
> would correctly report the new role.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to