ritegarg commented on code in PR #2479:
URL: https://github.com/apache/phoenix/pull/2479#discussion_r3291064402
##########
phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/HAGroupStoreClient.java:
##########
@@ -1047,6 +1114,177 @@ private long
validateTransitionAndGetWaitTime(HAGroupStoreRecord.HAGroupState cu
return Math.max(0, remainingTime);
}
+ // ========== Legacy /phoenix/ha CRR Sync ==========
+
+ /**
+ * Derives the combined CRR from local + peer records and CAS-writes it to
{@code /phoenix/ha}.
+ * CAS losses are logged and skipped; the next consistentHA cache event or
periodic cycle
+ * reconverges.
+ */
+ private void syncLegacyCRRIfRoleChanged() {
+ if (!legacyCrrSyncEnabled || !isHealthy) {
+ return;
+ }
+ // Snapshot mutable resources up front so a concurrent close() can't null
them mid-method
+ // and trigger NPEs / writes through a torn-down Curator client.
+ final PhoenixHAAdmin admin = this.legacyHaAdmin;
+ final NodeCache cache = this.legacyCrrNodeCache;
+ if (admin == null || cache == null) {
+ return;
+ }
+ try {
+ HAGroupStoreRecord local = getHAGroupStoreRecord();
+ if (local == null) {
+ LOGGER.debug("Skipping legacy CRR sync for HA group {}: no local
consistentHA record",
+ haGroupName);
+ return;
+ }
+ // Wait for peer URL before building the desired CRR (ctor NPEs on null
url2).
+ if (StringUtils.isBlank(local.getPeerZKUrl())) {
+ LOGGER.debug("Skipping legacy CRR sync for HA group {}: peer ZK URL is
blank", haGroupName);
+ return;
+ }
+ HAGroupStoreRecord peer = getHAGroupStoreRecordFromPeer();
+ // NodeCache is eventually consistent; on apparent absence, fall back to
an authoritative
+ // ZK read so the equality check and CAS both see consistent state.
+ Pair<ClusterRoleRecord, Stat> snapshot = readLegacyCrrSnapshot(cache);
+ if (snapshot.getRight() == null) {
+ snapshot = admin.getClusterRoleRecordAndStatInZooKeeper(haGroupName);
+ }
+ ClusterRoleRecord existing = snapshot.getLeft();
+ Stat existingStat = snapshot.getRight();
+ if (!shouldWriteLegacyCrr(existing)) {
+ return;
+ }
+ ClusterRoleRecord desired = buildDesiredLegacyCrr(local, peer, existing);
+ if (desired.isLogicallyEqualIgnoringVersionAndRegistry(existing)) {
+ LOGGER.debug("Legacy CRR for HA group {} already up to date at version
{}", haGroupName,
+ existing.getVersion());
+ return;
+ }
+ try {
+ if (existingStat == null) {
+ admin.createOrUpdateClusterRoleRecordWithCAS(haGroupName, desired,
+ PhoenixHAAdmin.LegacyCrrWriteMode.CREATE_NEW, /* ignored */ 0);
+ } else {
+ admin.createOrUpdateClusterRoleRecordWithCAS(haGroupName, desired,
+ PhoenixHAAdmin.LegacyCrrWriteMode.CAS_WITH_VERSION,
existingStat.getVersion());
+ }
+ LOGGER.info("Synced legacy CRR for HA group {} (version {} -> {})",
haGroupName,
+ existing != null ? existing.getVersion() : -1L,
desired.getVersion());
+ } catch (StaleClusterRoleRecordVersionException stale) {
+ // CAS lost; next event/periodic cycle reconverges.
+ LOGGER.info("Legacy CRR CAS lost for HA group {} at expected stat
version {}", haGroupName,
+ existingStat != null ? existingStat.getVersion() : -1);
+ }
+ } catch (Exception e) {
+ LOGGER.warn(
+ "Legacy CRR sync failed for HA group {}; will be retried by next
event/periodic cycle",
+ haGroupName, e);
+ }
+ }
+
+ /**
+ * Policy gate before issuing a CAS write to the legacy CRR. Returns {@code
false} when the
+ * existing record must not be overwritten by this client.
+ */
+ private boolean shouldWriteLegacyCrr(ClusterRoleRecord existing) {
+ // Refuse to overwrite a non-ZK (admin-managed RPC) legacy CRR; live
readers use its
+ // registryType to build connection strings, so swapping form would break
them.
+ if (existing != null && existing.getRegistryType() != RegistryType.ZK) {
Review Comment:
The feature branch hasn't shipped, so every pre-existing /phoenix/ha/<G>
record was written by apache:master or earlier. Master's toJson always emits
registryType, so the three realistic record shapes all behave correctly:
- Master ZK record ("registryType":"ZK" explicit) → parses cleanly →
shouldWriteLegacyCrr true → sync proceeds.
- Admin-managed RPC record ("registryType":"RPC" explicit) → parses as RPC →
gate fires → sync skips (intended).
- Pre-PHOENIX-7495 record (zk1/zk2 keys, no registryType) → strict Jackson
rejects unknown fields → existing=null, stat populated → CAS_WITH_VERSION
branch overwrites with a fresh ZK record.
The "no registryType AND RPC-parsable URL" combination can't be produced by
any shipped writer, so no live record is locked out. Added
testLegacyCrrSyncMigratesOlderZk1Zk2Record in HAGroupStoreClientIT (14/14
legacy-sync ITs green) covering the pre-PHOENIX-7495 path explicitly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]