Copilot commented on code in PR #10486:
URL: https://github.com/apache/ozone/pull/10486#discussion_r3397197937


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMProxyInfo.java:
##########
@@ -106,10 +135,94 @@ public synchronized void 
createProxyIfNeeded(CheckedFunction<InetSocketAddress,
   }
 
   /**
-   * A {@link OMProxyInfo} map with a particular order,
+   * Re-resolve {@link #rpcAddrStr} via DNS and, if the resolved IP has
+   * changed, replace the cached {@link #rpcAddr} (and the derived
+   * delegation-token service) and discard the cached proxy so that the
+   * next {@link #createProxyIfNeeded} call dials the new IP. The stale
+   * RPC proxy is closed via {@link RPC#stopProxy} so the underlying
+   * Hadoop {@code Client} connection thread and authenticated SASL
+   * session against the gone-away peer are not leaked.
+   * <p>
+   * Returns true when a swap occurred. Off the failure path this is a
+   * no-op (returns false): unchanged IP, unresolved lookup, or
+   * malformed host string.
+   * <p>
+   * The DNS lookup and the {@code RPC.stopProxy} call are performed
+   * outside the entry monitor so that a slow / dead resolver or a
+   * blocking proxy teardown does not freeze concurrent readers of
+   * {@link #getAddress()} / {@link #getProxy()}.
+   */
+  public boolean refreshAddressIfChanged() {
+    final InetSocketAddress refreshed;
+    try {
+      refreshed = NetUtils.createSocketAddr(rpcAddrStr);
+    } catch (IllegalArgumentException ex) {
+      LOG.warn("Failed to re-resolve OM address {}: {}", rpcAddrStr,
+          ex.getMessage());

Review Comment:
   Similarly here, logging only ex.getMessage() loses the stack trace for 
malformed addresses / resolution errors. Include the exception so operators can 
see the full failure details.



##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMProxyInfo.java:
##########
@@ -83,11 +103,20 @@ public String getAddressString() {
     return rpcAddrStr;
   }
 
-  public InetSocketAddress getAddress() {
+  public synchronized InetSocketAddress getAddress() {
     return rpcAddr;
   }
 
-  public Text getDelegationTokenService() {
+  /**
+   * Test-only: inject a deliberately stale cached address to drive
+   * the DNS-refresh code path without standing up a real OM.
+   */
+  @VisibleForTesting
+  synchronized void setCachedAddressForTest(InetSocketAddress address) {
+    this.rpcAddr = address;
+  }

Review Comment:
   The test-only setter currently allows null, which can lead to a confusing 
NPE later in refreshAddressIfChanged() when it dereferences rpcAddr. Enforcing 
non-null here keeps OMProxyInfo invariants intact and makes failures clearer.



##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMProxyInfo.java:
##########
@@ -106,10 +135,94 @@ public synchronized void 
createProxyIfNeeded(CheckedFunction<InetSocketAddress,
   }
 
   /**
-   * A {@link OMProxyInfo} map with a particular order,
+   * Re-resolve {@link #rpcAddrStr} via DNS and, if the resolved IP has
+   * changed, replace the cached {@link #rpcAddr} (and the derived
+   * delegation-token service) and discard the cached proxy so that the
+   * next {@link #createProxyIfNeeded} call dials the new IP. The stale
+   * RPC proxy is closed via {@link RPC#stopProxy} so the underlying
+   * Hadoop {@code Client} connection thread and authenticated SASL
+   * session against the gone-away peer are not leaked.
+   * <p>
+   * Returns true when a swap occurred. Off the failure path this is a
+   * no-op (returns false): unchanged IP, unresolved lookup, or
+   * malformed host string.
+   * <p>
+   * The DNS lookup and the {@code RPC.stopProxy} call are performed
+   * outside the entry monitor so that a slow / dead resolver or a
+   * blocking proxy teardown does not freeze concurrent readers of
+   * {@link #getAddress()} / {@link #getProxy()}.
+   */
+  public boolean refreshAddressIfChanged() {
+    final InetSocketAddress refreshed;
+    try {
+      refreshed = NetUtils.createSocketAddr(rpcAddrStr);
+    } catch (IllegalArgumentException ex) {
+      LOG.warn("Failed to re-resolve OM address {}: {}", rpcAddrStr,
+          ex.getMessage());
+      return false;
+    }
+    if (refreshed.isUnresolved()) {
+      LOG.warn("OM hostname {} re-resolved to an unresolved address; "
+          + "leaving cached entry in place.", rpcAddrStr);
+      return false;
+    }
+    // Compute the new delegation-token service identifier OUTSIDE the
+    // entry monitor. SecurityUtil.buildTokenService is unlikely to throw
+    // for a resolved address, but if it ever did inside the swap block
+    // we'd be left with rpcAddr=new but dtService=old and proxy=non-null
+    // pointing at the old IP -- and the equality short-circuit at the
+    // top of the synchronized block below would skip every subsequent
+    // refresh attempt because rpcAddr already matches the new IP.
+    // Building first means a throw here aborts the whole refresh with
+    // no state change.
+    final Text newDtService = SecurityUtil.buildTokenService(refreshed);
+    final T staleProxy;
+    final InetSocketAddress old;
+    synchronized (this) {
+      // Null-safe IP comparison. The constructor accepts (with a warn)
+      // an unresolved rpcAddr -- in that case rpcAddr.getAddress() is
+      // null, and a successful re-resolution is genuinely a change so
+      // we MUST proceed to swap rather than NPE on .equals().
+      InetAddress cachedIp = rpcAddr.getAddress();
+      InetAddress refreshedIp = refreshed.getAddress();
+      if (cachedIp != null && refreshedIp != null
+          && refreshedIp.equals(cachedIp)) {
+        return false;
+      }
+      old = rpcAddr;
+      staleProxy = this.proxy;
+      this.rpcAddr = refreshed;
+      this.dtService = newDtService;
+      this.proxy = null;
+    }
+    if (staleProxy != null) {
+      try {
+        RPC.stopProxy(staleProxy);
+      } catch (RuntimeException stopEx) {
+        LOG.warn("Failed to stop stale OM proxy for nodeId {}: {}",
+            nodeId, stopEx.getMessage());

Review Comment:
   Logging only the exception message drops the stack trace, which can make 
proxy-stop failures harder to diagnose. Log the exception itself so the stack 
trace is preserved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to