ritegarg opened a new pull request, #2479:
URL: https://github.com/apache/phoenix/pull/2479
### What changes were proposed in this pull request?
Adds an opt-in sync from `/phoenix/consistentHA/<G>` to the legacy
`/phoenix/ha/<G>` znode in
each region server's `HAGroupStoreClient`. Each client derives a combined
`ClusterRoleRecord`
(local role + peer role) and writes it to its local `/phoenix/ha` via ZK
stat-version CAS, so
pre-consistentHA ZK-registry clients keep reading a current view of the HA
group.
**New config keys** (in `QueryServices` / `QueryServicesOptions`):
* `phoenix.ha.legacy.crr.sync.enabled` — master switch (default `false`).
* `phoenix.ha.legacy.crr.reconciliation.interval.seconds` — periodic
reconciliation cadence
(default `60`; `0` disables only the periodic loop, the event-driven path
still runs).
**Convergence model:**
* Event-driven on consistentHA `CHILD_ADDED` / `CHILD_UPDATED` (LOCAL or
PEER), offloaded onto
a per-group single-thread `ScheduledExecutorService` so Curator's
per-namespace event
dispatcher never blocks on ZK or JDBC I/O.
* Periodic reconciler on the same executor (0–30s jitter on first run).
* Curator `NodeCache` watcher on the legacy znode itself, so per-sync reads
are in-memory
rather than a ZK round-trip. The NodeCache is eventually consistent, so on
apparent absence
the sync path falls back to one authoritative `getData()` to disambiguate
"absent" from
"not-yet-cached" before deciding `CREATE_NEW` vs `CAS_WITH_VERSION`.
* On CAS `BadVersion`: log at INFO and bail. The next event/periodic cycle
reconverges. No
inline retry.
**Code-level changes:**
* `ClusterRoleRecord`: new explicit-`RegistryType` ctor; `@JsonCreator` now
round-trips
`registryType` (legacy znode is always written with `RegistryType.ZK` for
backward
compatibility, and must read back as ZK). New
`isLogicallyEqualIgnoringVersionAndRegistry`
helper used as the rewrite short-circuit.
* `PhoenixHAAdmin`:
* Three new helpers on the legacy namespace:
* `getClusterRoleRecordAndStatInZooKeeper` — atomic (data, stat)
read via
`storingStatIn`.
* `createOrUpdateClusterRoleRecordWithCAS(name, record,
LegacyCrrWriteMode mode, int
expectedStatVersion)` — typed write API. `LegacyCrrWriteMode` is a
Phoenix-internal
enum with values `CREATE_NEW`, `FORCE_OVERWRITE`
(`@VisibleForTesting`),
`CAS_WITH_VERSION`; the `int` argument is meaningful only for
`CAS_WITH_VERSION`
(validated `>= 0` at entry). Both `BadVersion` (CAS lost) and
`NodeExists` (create
lost) surface as `StaleClusterRoleRecordVersionException` so
callers retry
uniformly.
* Atomic-read fix on the existing `getHAGroupStoreRecordInZooKeeper`
(replaces a
pre-existing 2-RPC pattern that could return a stat version not
matching its data
bytes).
* `HAGroupStoreClient`: `legacyHaAdmin` + `legacyCrrNodeCache` + dedicated
periodic executor.
Listener hooks for `CHILD_ADDED` / `CHILD_UPDATED`. `CHILD_REMOVED` is
intentionally a
no-op — the legacy `/phoenix/ha` znode is never deleted by this client.
* New `StaleClusterRoleRecordVersionException` (CRR-specific analog of
`StaleHAGroupStoreRecordVersionException`).
**Concurrency / lifecycle invariants:**
* All sync invocations (initial, event-driven, periodic) route through the
single-thread
executor, so they are serialized naturally; no per-instance lock is held
across ZK I/O.
* The initial sync is submitted to the executor rather than invoked inline
in the constructor,
so a fresh `HAGroupStoreClient` does not block `getInstanceForZkUrl`'s
static class monitor
on ZK / JDBC I/O.
* `syncLegacyCRRIfRoleChanged` snapshots `legacyHaAdmin` /
`legacyCrrNodeCache` to local
finals before touching them, so a racing `close()` cannot NPE an in-flight
call or push
writes through a torn-down Curator client.
* `close()` marks the instance `isHealthy=false` and removes it from the
static `instances`
map up front, so a concurrent `getInstanceForZkUrl` cannot hand out a
half-closed client.
**Back-compat and divergence guards:**
* Refuse to overwrite a non-ZK admin-managed legacy CRR (typically
`RegistryType.RPC` with
master-form URLs). Existing readers feed the stored URLs through
`JDBCUtil.formatUrl(url, roleRecord.getRegistryType())` to build
connection strings;
swapping `registryType` + URLs out from under them would silently change
their target. The
sync skips with a WARN and requires an explicit admin migration to ZK form
before
proceeding.
* Wait for a non-blank peer ZK URL on the local record before building the
desired CRR;
`ClusterRoleRecord`'s ctor would otherwise NPE on `null url2` via
`JDBCUtil.formatUrl`.
* When the local peer cache returns null, preserve `existing.role2` in the
desired CRR
rather than downgrading to `UNKNOWN`. `UNKNOWN` is strictly less
information than a
previously-recorded role, and downgrading would cause a multi-RS write
storm where peers
with differing visibility CAS-overwrite each other.
### Why are the changes needed?
The Consistent Cluster Failover work moved HA group state from `/phoenix/ha`
(which held a
`ClusterRoleRecord`) to `/phoenix/consistentHA` (which holds an
`HAGroupStoreRecord` and uses
an RPC-registry view of CRRs). Existing Phoenix clients built against the
older ZK-registry
contract still read `/phoenix/ha/<G>` and expect:
* The znode to exist for any live HA group.
* The record's `registryType` to be `ZK`.
* The roles to reflect the latest state of both clusters.
Without this patch, after a CCF rollout the `/phoenix/ha` znode goes stale
or absent, and old
ZK-registry clients break. This change keeps `/phoenix/ha` in sync with the
new authoritative
`/phoenix/consistentHA` records on a per-RS basis, gated by an opt-in flag
so it only runs
where the operator needs it.
### Does this PR introduce _any_ user-facing change?
Yes, but the new behavior is gated entirely behind the new master switch
(`phoenix.ha.legacy.crr.sync.enabled`), which defaults to `false`.
When the flag is enabled:
* Each `HAGroupStoreClient` writes to (and maintains) the legacy
`/phoenix/ha/<G>` znode on
its local ZK.
* Old pre-consistentHA ZK-registry clients pointed at `/phoenix/ha` will see
fresh records
again.
* The flag is read once at construction; toggling it off requires a process
restart. This is
documented in `HAGroupStoreClient`'s class Javadoc.
* **Rollout constraint:** the persisted legacy znode now carries the new
`registryType`
field. Pre-PR clients using the default Jackson `ObjectMapper`
(`FAIL_ON_UNKNOWN_PROPERTIES=true`) will reject the new bytes. All readers
should be
upgraded to a tolerant build **before** enabling this flag on any writer.
One incidental visible change independent of the flag: `ClusterRoleRecord`
JSON
deserialization now respects the persisted `registryType` field. Previously
it was hard-coded
to `RPC` regardless of JSON content. Existing JSON without a `registryType`
field still
defaults to `RPC`, so deployments not exposed to legacy-sync writes see no
behavior change.
### How was this patch tested?
* Integration tests in `HAGroupStoreClientIT` (40 cases, all passing)
covering: feature-off
no-op, initial create, role-changing transition, state-only transition (no
rewrite),
peer-driven role flip, peer-absent → `role2=UNKNOWN` → recovery, CAS error
mapping +
write-mode dispatch (`CREATE_NEW`, `FORCE_OVERWRITE`, `CAS_WITH_VERSION`
negative-version
rejection), LOCAL/PEER `CHILD_REMOVED` (znode preserved), `registryType`
stays ZK across
multiple role cycles, reconciliation interval `=0` (event-driven still
works), periodic-loop
recovery after an external divergence, **and the two halves of the
peer-view contract**:
peer absent → existing `role2` preserved (no write fires), peer present →
stale `role2`
reconciles to live state.
* Unit tests in `ClusterRoleRecordTest` (21 cases, all passing) covering
JSON backward-compat
(missing `registryType` defaults to RPC, explicit RPC round-trips,
explicit ZK round-trips,
full ZK toJson/fromJson round-trip preserves `registryType`) and the
`isLogicallyEqualIgnoringVersionAndRegistry` helper (null,
same-modulo-version, ZK-vs-RPC
URL form, role mismatch, name mismatch). Backed by three JSON fixtures
under
`phoenix-core/src/test/resources/json/`.
* `mvn spotless:check` clean across the touched modules.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic) and Cursor (Claude).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]