SteNicholas commented on code in PR #2629:
URL: https://github.com/apache/celeborn/pull/2629#discussion_r1696710474


##########
common/src/main/java/org/apache/celeborn/common/client/MasterClient.java:
##########
@@ -226,14 +225,23 @@ private void resetRpcEndpointRef(@Nullable RpcEndpointRef 
oldRef) {
    *     cannot be obtained.
    * @return non-empty RpcEndpointRef.
    */
-  private RpcEndpointRef getOrSetupRpcEndpointRef(AtomicInteger currentIndex) {
+  private RpcEndpointRef getOrSetupRpcEndpointRef(AtomicInteger currentIndex, 
int currentAttempt) {
     RpcEndpointRef endpointRef = rpcEndpointRef.get();
+
+    List<String> activeMasterEndpoints = 
masterEndpointResolver.getActiveMasterEndpoints();
+    // If endpoints are updated by MasterEndpointResolver, we should reset the 
currentIndex to 0.
+    // This also unset the value of updated, so we don't always reset 
currentIndex to 0.
+    if (masterEndpointResolver.getUpdatedAndReset()) {
+      currentIndex.set(0);
+      maxRetries = Math.max(maxRetries, currentAttempt + 
activeMasterEndpoints.size());

Review Comment:
   Why does this change `maxRetries`? IMO, the `maxRetries` is global in 
`MasterClient` and does not need to depend on `currentAttempt`. Meanwhile, does 
this change affect the `celeborn.masterClient.maxRetries` config?  Any missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to