Copilot commented on code in PR #10485:
URL: https://github.com/apache/ozone/pull/10485#discussion_r3397140251


##########
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/ratis/TestOzoneManagerRatisServer.java:
##########
@@ -145,6 +145,37 @@ public void testStartOMRatisServer() throws Exception {
         "Ratis Server should be in running state");
   }
 
+  /**
+   * RaftPeer.address must always be a hostname:port string, never an
+   * IP:port string. This is what allows gRPC's DnsNameResolver to
+   * re-resolve the peer when its underlying IP changes (Kubernetes pod
+   * restart). If a resolved IP were baked in, recovery would require
+   * a full OM restart.
+   */

Review Comment:
   The Javadoc says RaftPeer.address must never be an IP:port string, but the 
actual requirement for HDDS-15514 is to preserve the configured host string and 
avoid pre-resolving a hostname to a numeric IP before handing it to Ratis/gRPC. 
Rewording this avoids misleading readers into thinking numeric IP configuration 
is invalid.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMRatisServerImpl.java:
##########
@@ -396,8 +396,14 @@ private static RaftGroup buildRaftGroup(SCMNodeDetails 
details,
     final RaftGroupId groupId = buildRaftGroupId(clusterId);
     RaftPeerId selfPeerId = getSelfPeerId(scmId);
 
+    // The peer address is intentionally a hostname:port string, never a
+    // resolved IP. Ratis hands this string to gRPC's NettyChannelBuilder,
+    // whose default DnsNameResolver re-resolves hostnames on connection
+    // failure. Baking a resolved IP would freeze the peer at one IP,
+    // breaking recovery from peer pod restarts in Kubernetes-style
+    // environments where DNS names are stable but IPs are not. See
+    // HDDS-15514 (DNS-refresh-on-failure for all RPC paths).

Review Comment:
   The new comment says the peer address is never a resolved IP; the important 
invariant is that the configured host string is passed through (so a hostname 
stays a hostname) rather than constructing an InetSocketAddress from an 
InetAddress. Consider rewording to avoid implying that numeric IP configuration 
is inherently invalid.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAManagerImpl.java:
##########
@@ -118,9 +118,11 @@ public void start() throws IOException {
       final boolean success = HAUtils.addSCM(ozoneConf,
           new AddSCMRequest.Builder().setClusterId(scm.getClusterId())
               .setScmId(scm.getScmId())
-              .setRatisAddr(nodeDetails
-                  // TODO : Should we use IP instead of hostname??
-                  .getRatisHostPortStr()).build(), scm.getSCMNodeId());
+              // Hostname is intentional: gRPC's DnsNameResolver re-resolves
+              // hostnames on connection failure, so SCM peer pod restarts
+              // are recovered automatically. See HDDS-15514.

Review Comment:
   This comment implies hostname is always required. The main requirement is to 
avoid baking a resolved IP into the Ratis address; passing through the 
configured host string preserves hostnames when used, while still allowing 
explicit IP configuration when desired.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/OzoneManagerRatisServer.java:
##########
@@ -435,44 +435,29 @@ private void updateRatisConfiguration(List<RaftPeer> 
followers, List<RaftPeer> l
     }
   }
 
-  private static RaftPeer createRaftPeer(OMNodeDetails omNode) {
-    String nodeId = omNode.getNodeId();
-    RaftPeerId raftPeerId = RaftPeerId.valueOf(nodeId);
-    InetSocketAddress ratisAddr = new InetSocketAddress(
-        omNode.getHostAddress(), omNode.getRatisPort());
-    RaftPeerRole startRole = omNode.isRatisListener() ?
-        RaftPeerRole.LISTENER : RaftPeerRole.FOLLOWER;
+  /**
+   * Build a RaftPeer for the given OM node. The peer address is always set
+   * as a hostname:port string (never a resolved IP). Ratis ultimately hands
+   * this string to gRPC's NettyChannelBuilder.forTarget(...), whose default
+   * DnsNameResolver re-resolves hostnames on connection failure / refresh.
+   * Baking a resolved IP into RaftPeer.address would freeze the peer at one
+   * IP for the channel's lifetime, breaking recovery from peer pod restarts
+   * in environments like Kubernetes where DNS names are stable but IPs are
+   * not. See HDDS-15514 (DNS-refresh-on-failure for all RPC paths).
+   */

Review Comment:
   This Javadoc states the peer address is always a hostname:port string, which 
is stronger than what the implementation guarantees/needs. The key invariant 
is: don’t pass a resolved InetAddress/InetSocketAddress (which bakes in a 
numeric IP); instead, pass the configured host string through to gRPC so it can 
re-resolve when needed. Rewording will better match the behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to