ritegarg opened a new pull request, #2479:
URL: https://github.com/apache/phoenix/pull/2479

   ### What changes were proposed in this pull request?
   
   Adds an opt-in sync from `/phoenix/consistentHA/<G>` to the legacy 
`/phoenix/ha/<G>` znode in
   each region server's `HAGroupStoreClient`. Each client derives a combined 
`ClusterRoleRecord`
   (local role + peer role) and writes it to its local `/phoenix/ha` via ZK 
stat-version CAS, so
   pre-consistentHA ZK-registry clients keep reading a current view of the HA 
group.
   
   **New config keys** (in `QueryServices` / `QueryServicesOptions`):
   
   * `phoenix.ha.legacy.crr.sync.enabled` — master switch (default `false`).
   * `phoenix.ha.legacy.crr.reconciliation.interval.seconds` — periodic 
reconciliation cadence
     (default `60`; `0` disables only the periodic loop, the event-driven path 
still runs).
   
   **Convergence model:**
   
   * Event-driven on consistentHA `CHILD_ADDED` / `CHILD_UPDATED` (LOCAL or 
PEER), offloaded onto
     a per-group single-thread `ScheduledExecutorService` so Curator's 
per-namespace event
     dispatcher never blocks on ZK or JDBC I/O.
   * Periodic reconciler on the same executor (0–30s jitter on first run).
   * Curator `NodeCache` watcher on the legacy znode itself, so per-sync reads 
are in-memory
     rather than a ZK round-trip. The NodeCache is eventually consistent, so on 
apparent absence
     the sync path falls back to one authoritative `getData()` to disambiguate 
"absent" from
     "not-yet-cached" before deciding `CREATE_NEW` vs `CAS_WITH_VERSION`.
   * On CAS `BadVersion`: log at INFO and bail. The next event/periodic cycle 
reconverges. No
     inline retry.
   
   **Code-level changes:**
   
   * `ClusterRoleRecord`: new explicit-`RegistryType` ctor; `@JsonCreator` now 
round-trips
     `registryType` (legacy znode is always written with `RegistryType.ZK` for 
backward
     compatibility, and must read back as ZK). New 
`isLogicallyEqualIgnoringVersionAndRegistry`
     helper used as the rewrite short-circuit.
   * `PhoenixHAAdmin`:
       * Three new helpers on the legacy namespace:
           * `getClusterRoleRecordAndStatInZooKeeper` — atomic (data, stat) 
read via
             `storingStatIn`.
           * `createOrUpdateClusterRoleRecordWithCAS(name, record, 
LegacyCrrWriteMode mode, int
             expectedStatVersion)` — typed write API. `LegacyCrrWriteMode` is a 
Phoenix-internal
             enum with values `CREATE_NEW`, `FORCE_OVERWRITE` 
(`@VisibleForTesting`),
             `CAS_WITH_VERSION`; the `int` argument is meaningful only for 
`CAS_WITH_VERSION`
             (validated `>= 0` at entry). Both `BadVersion` (CAS lost) and 
`NodeExists` (create
             lost) surface as `StaleClusterRoleRecordVersionException` so 
callers retry
             uniformly.
       * Atomic-read fix on the existing `getHAGroupStoreRecordInZooKeeper` 
(replaces a
         pre-existing 2-RPC pattern that could return a stat version not 
matching its data
         bytes).
   * `HAGroupStoreClient`: `legacyHaAdmin` + `legacyCrrNodeCache` + dedicated 
periodic executor.
     Listener hooks for `CHILD_ADDED` / `CHILD_UPDATED`. `CHILD_REMOVED` is 
intentionally a
     no-op — the legacy `/phoenix/ha` znode is never deleted by this client.
   * New `StaleClusterRoleRecordVersionException` (CRR-specific analog of
     `StaleHAGroupStoreRecordVersionException`).
   
   **Concurrency / lifecycle invariants:**
   
   * All sync invocations (initial, event-driven, periodic) route through the 
single-thread
     executor, so they are serialized naturally; no per-instance lock is held 
across ZK I/O.
   * The initial sync is submitted to the executor rather than invoked inline 
in the constructor,
     so a fresh `HAGroupStoreClient` does not block `getInstanceForZkUrl`'s 
static class monitor
     on ZK / JDBC I/O.
   * `syncLegacyCRRIfRoleChanged` snapshots `legacyHaAdmin` / 
`legacyCrrNodeCache` to local
     finals before touching them, so a racing `close()` cannot NPE an in-flight 
call or push
     writes through a torn-down Curator client.
   * `close()` marks the instance `isHealthy=false` and removes it from the 
static `instances`
     map up front, so a concurrent `getInstanceForZkUrl` cannot hand out a 
half-closed client.
   
   **Back-compat and divergence guards:**
   
   * Refuse to overwrite a non-ZK admin-managed legacy CRR (typically 
`RegistryType.RPC` with
     master-form URLs). Existing readers feed the stored URLs through
     `JDBCUtil.formatUrl(url, roleRecord.getRegistryType())` to build 
connection strings;
     swapping `registryType` + URLs out from under them would silently change 
their target. The
     sync skips with a WARN and requires an explicit admin migration to ZK form 
before
     proceeding.
   * Wait for a non-blank peer ZK URL on the local record before building the 
desired CRR;
     `ClusterRoleRecord`'s ctor would otherwise NPE on `null url2` via 
`JDBCUtil.formatUrl`.
   * When the local peer cache returns null, preserve `existing.role2` in the 
desired CRR
     rather than downgrading to `UNKNOWN`. `UNKNOWN` is strictly less 
information than a
     previously-recorded role, and downgrading would cause a multi-RS write 
storm where peers
     with differing visibility CAS-overwrite each other.
   
   ### Why are the changes needed?
   
   The Consistent Cluster Failover work moved HA group state from `/phoenix/ha` 
(which held a
   `ClusterRoleRecord`) to `/phoenix/consistentHA` (which holds an 
`HAGroupStoreRecord` and uses
   an RPC-registry view of CRRs). Existing Phoenix clients built against the 
older ZK-registry
   contract still read `/phoenix/ha/<G>` and expect:
   
   * The znode to exist for any live HA group.
   * The record's `registryType` to be `ZK`.
   * The roles to reflect the latest state of both clusters.
   
   Without this patch, after a CCF rollout the `/phoenix/ha` znode goes stale 
or absent, and old
   ZK-registry clients break. This change keeps `/phoenix/ha` in sync with the 
new authoritative
   `/phoenix/consistentHA` records on a per-RS basis, gated by an opt-in flag 
so it only runs
   where the operator needs it.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but the new behavior is gated entirely behind the new master switch
   (`phoenix.ha.legacy.crr.sync.enabled`), which defaults to `false`.
   
   When the flag is enabled:
   
   * Each `HAGroupStoreClient` writes to (and maintains) the legacy 
`/phoenix/ha/<G>` znode on
     its local ZK.
   * Old pre-consistentHA ZK-registry clients pointed at `/phoenix/ha` will see 
fresh records
     again.
   * The flag is read once at construction; toggling it off requires a process 
restart. This is
     documented in `HAGroupStoreClient`'s class Javadoc.
   * **Rollout constraint:** the persisted legacy znode now carries the new 
`registryType`
     field. Pre-PR clients using the default Jackson `ObjectMapper`
     (`FAIL_ON_UNKNOWN_PROPERTIES=true`) will reject the new bytes. All readers 
should be
     upgraded to a tolerant build **before** enabling this flag on any writer.
   
   One incidental visible change independent of the flag: `ClusterRoleRecord` 
JSON
   deserialization now respects the persisted `registryType` field. Previously 
it was hard-coded
   to `RPC` regardless of JSON content. Existing JSON without a `registryType` 
field still
   defaults to `RPC`, so deployments not exposed to legacy-sync writes see no 
behavior change.
   
   ### How was this patch tested?
   
   * Integration tests in `HAGroupStoreClientIT` (40 cases, all passing) 
covering: feature-off
     no-op, initial create, role-changing transition, state-only transition (no 
rewrite),
     peer-driven role flip, peer-absent → `role2=UNKNOWN` → recovery, CAS error 
mapping +
     write-mode dispatch (`CREATE_NEW`, `FORCE_OVERWRITE`, `CAS_WITH_VERSION` 
negative-version
     rejection), LOCAL/PEER `CHILD_REMOVED` (znode preserved), `registryType` 
stays ZK across
     multiple role cycles, reconciliation interval `=0` (event-driven still 
works), periodic-loop
     recovery after an external divergence, **and the two halves of the 
peer-view contract**:
     peer absent → existing `role2` preserved (no write fires), peer present → 
stale `role2`
     reconciles to live state.
   * Unit tests in `ClusterRoleRecordTest` (21 cases, all passing) covering 
JSON backward-compat
     (missing `registryType` defaults to RPC, explicit RPC round-trips, 
explicit ZK round-trips,
     full ZK toJson/fromJson round-trip preserves `registryType`) and the
     `isLogicallyEqualIgnoringVersionAndRegistry` helper (null, 
same-modulo-version, ZK-vs-RPC
     URL form, role mismatch, name mismatch). Backed by three JSON fixtures 
under
     `phoenix-core/src/test/resources/json/`.
   * `mvn spotless:check` clean across the touched modules.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude (Anthropic) and Cursor (Claude).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to