Copilot commented on code in PR #10487:
URL: https://github.com/apache/ozone/pull/10487#discussion_r3397202080


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMProxyInfo.java:
##########
@@ -83,11 +103,20 @@ public String getAddressString() {
     return rpcAddrStr;
   }
 
-  public InetSocketAddress getAddress() {
+  public synchronized InetSocketAddress getAddress() {
     return rpcAddr;
   }
 
-  public Text getDelegationTokenService() {
+  /**
+   * Test-only: inject a deliberately stale cached address to drive
+   * the DNS-refresh code path without standing up a real OM.
+   */
+  @VisibleForTesting
+  synchronized void setCachedAddressForTest(InetSocketAddress address) {
+    this.rpcAddr = address;
+  }

Review Comment:
   The test-only setter allows setting rpcAddr to null, but 
refreshAddressIfChanged() assumes rpcAddr is always non-null and will NPE if a 
test accidentally passes null. Guard the test hook with a null check to keep 
failures clear and localized.



##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/HadoopRpcOMFollowerReadFailoverProxyProvider.java:
##########
@@ -385,8 +385,14 @@ public void close() throws IOException {
 
     @Override
     public ConnectionId getConnectionId() {
+      // Read the proxy through the synchronized accessor instead of the
+      // inherited public field. With DNS-refresh-on-failure, OMProxyInfo
+      // mutates the proxy field under its monitor, so a direct field
+      // read can observe a torn value (a stale proxy whose underlying
+      // connection was just stopped, or a fresh proxy that was just
+      // installed without proper visibility for the reader).

Review Comment:
   The comment says a direct field read can observe a "torn" value. Object 
references are not subject to torn reads; the real concern here is stale reads 
/ missing visibility without synchronization. Rewording avoids a misleading 
concurrency term.



##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/ratis/TestOzoneManagerRatisServer.java:
##########
@@ -145,6 +145,37 @@ public void testStartOMRatisServer() throws Exception {
         "Ratis Server should be in running state");
   }
 
+  /**
+   * RaftPeer.address must always be a hostname:port string, never an
+   * IP:port string. This is what allows gRPC's DnsNameResolver to
+   * re-resolve the peer when its underlying IP changes (Kubernetes pod
+   * restart). If a resolved IP were baked in, recovery would require
+   * a full OM restart.

Review Comment:
   This test comment states RaftPeer.address must always be "hostname:port" and 
never "IP:port". The important invariant is "don’t bake a resolved IP" (i.e., 
keep the configured host string); IP literals can still be a valid 
configuration. Rewording makes the rationale accurate.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java:
##########
@@ -435,44 +435,29 @@ private void updateRatisConfiguration(List<RaftPeer> 
followers, List<RaftPeer> l
     }
   }
 
-  private static RaftPeer createRaftPeer(OMNodeDetails omNode) {
-    String nodeId = omNode.getNodeId();
-    RaftPeerId raftPeerId = RaftPeerId.valueOf(nodeId);
-    InetSocketAddress ratisAddr = new InetSocketAddress(
-        omNode.getHostAddress(), omNode.getRatisPort());
-    RaftPeerRole startRole = omNode.isRatisListener() ?
-        RaftPeerRole.LISTENER : RaftPeerRole.FOLLOWER;
+  /**
+   * Build a RaftPeer for the given OM node. The peer address is always set
+   * as a hostname:port string (never a resolved IP). Ratis ultimately hands
+   * this string to gRPC's NettyChannelBuilder.forTarget(...), whose default
+   * DnsNameResolver re-resolves hostnames on connection failure / refresh.

Review Comment:
   The Javadoc says the RaftPeer address is always a "hostname:port" string. In 
practice it should preserve the configured host string (hostname *or* IP 
literal) and avoid baking a Java-resolved InetAddress into the peer. Rewording 
clarifies the real invariant and avoids implying IP-based configs are invalid.



##########
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/ha/TestOMProxyInfoDnsRefresh.java:
##########
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.om.ha;
+
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertNotEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertNull;
+import static org.junit.jupiter.api.Assertions.assertSame;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import java.net.InetAddress;
+import java.net.InetSocketAddress;
+import org.junit.jupiter.api.Test;
+
+/**
+ * Verifies that {@link OMProxyInfo#refreshAddressIfChanged()} correctly
+ * detects DNS changes -- the Kubernetes pod-IP-change recovery path on
+ * the Client → OM RPC route.
+ */
+public class TestOMProxyInfoDnsRefresh {
+
+  /**
+   * When DNS for the configured hostname now returns the same IP that
+   * is already cached, refresh is a no-op. Returns false; cached
+   * address and proxy are untouched. Critically, the cached proxy must
+   * NOT be discarded -- a regression that nulled {@code proxy}
+   * unconditionally would tear down a healthy connection on every
+   * application-level failure.
+   */
+  @Test
+  public void testRefreshIsNoopWhenIpUnchanged() throws Exception {
+    Object originalProxy = new Object();
+    OMProxyInfo<Object> info = OMProxyInfo.newInstance(
+        originalProxy, "svc", "om1", "localhost:9862");
+    InetSocketAddress before = info.getAddress();
+
+    boolean swapped = info.refreshAddressIfChanged();
+
+    assertFalse(swapped, "no swap when DNS resolves to the same IP");
+    assertSame(before, info.getAddress(),
+        "cached address must not be replaced when IP is unchanged");
+    assertSame(originalProxy, info.getProxy(),
+        "cached proxy must NOT be discarded on a no-op refresh");
+  }
+
+  /**
+   * To drive the change-detection path we construct an OMProxyInfo
+   * pointing at "localhost", then inject a deliberately stale IP via
+   * the test hook. Re-resolving "localhost" then yields the live
+   * loopback IP, the cached stale IP differs, and the swap fires.
+   */
+  @Test
+  public void testRefreshSwapsAddressOnIpChange() throws Exception {
+    OMProxyInfo<Object> info = OMProxyInfo.newInstance(
+        /*proxy=*/ null, "svc", "om1", "localhost:9862");
+
+    InetSocketAddress staleAddr = new InetSocketAddress(
+        InetAddress.getByAddress(new byte[] {127, 0, 0, 99}), 9862);
+    info.setCachedAddressForTest(staleAddr);
+
+    boolean swapped = info.refreshAddressIfChanged();
+    assertTrue(swapped, "swap must fire when DNS returns a different IP "
+        + "than the stale 127.0.0.99 we forced into the cache");
+    assertNotEquals(staleAddr.getAddress(), info.getAddress().getAddress(),
+        "cached address must hold the freshly-resolved IP after swap");
+    assertNull(info.getProxy(),
+        "cached proxy must be discarded so the next dial uses the new IP");
+  }
+
+  /**
+   * createProxyIfNeeded rebuilds the proxy from the freshly-resolved
+   * address after a swap. The lambda asserts the parameter equals the
+   * post-refresh address -- a regression that passes a stale or null
+   * address to the factory would fire here.
+   */
+  @Test
+  public void testProxyRebuildsAfterRefreshUsesNewAddress() throws Exception {
+    OMProxyInfo<Object> info = OMProxyInfo.newInstance(
+        new Object(), "svc", "om1", "localhost:9862");
+
+    InetSocketAddress staleAddr = new InetSocketAddress(
+        InetAddress.getByAddress(new byte[] {127, 0, 0, 99}), 9862);
+    info.setCachedAddressForTest(staleAddr);
+    assertTrue(info.refreshAddressIfChanged());
+    assertNull(info.getProxy());
+
+    InetSocketAddress expectedNewAddress = info.getAddress();
+    Object freshProxy = new Object();
+    InetSocketAddress[] dialedWith = new InetSocketAddress[1];
+    info.createProxyIfNeeded(addr -> {
+      dialedWith[0] = addr;
+      return freshProxy;
+    });
+
+    assertSame(expectedNewAddress, dialedWith[0],
+        "factory must be invoked with the freshly-resolved address, "
+            + "not the stale one or null");
+    assertSame(freshProxy, info.getProxy());
+  }
+
+  /**
+   * dtService must update alongside rpcAddr on a successful swap.
+   * Stale dtService after refresh is a fragile invariant that would
+   * silently break post-refresh authentication.
+   */
+  @Test
+  public void testRefreshUpdatesDelegationTokenService() throws Exception {
+    OMProxyInfo<Object> info = OMProxyInfo.newInstance(
+        new Object(), "svc", "om1", "localhost:9862");
+    InetSocketAddress staleAddr = new InetSocketAddress(
+        InetAddress.getByAddress(new byte[] {127, 0, 0, 99}), 9862);
+    info.setCachedAddressForTest(staleAddr);
+    // dtService was built from the original "localhost" resolution; in
+    // SecurityUtil.buildTokenService form the value depends on the IP.
+    // After we forced the stale IP and refresh, dtService should be
+    // recomputed against the live IP. We can't easily compare exact
+    // strings (the value depends on hadoop.security.token.service.use_ip)
+    // but we can assert the field is non-null and that the address
+    // backing it matches the post-refresh address.
+    assertNotNull(info.getDelegationTokenService());
+    assertTrue(info.refreshAddressIfChanged());
+    assertNotNull(info.getDelegationTokenService(),
+        "dtService must be rebuilt after a successful swap");
+  }

Review Comment:
   testRefreshUpdatesDelegationTokenService() is currently vacuous: 
setCachedAddressForTest() only changes rpcAddr, while dtService remains built 
from the original (already-correct) resolution, so the test can pass even if 
refreshAddressIfChanged() forgets to update dtService. Make dtService 
deliberately stale so the assertion becomes load-bearing, and then assert it 
matches SecurityUtil.buildTokenService() for the refreshed address.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to