functioner commented on code in PR #1782:
URL: https://github.com/apache/zookeeper/pull/1782#discussion_r2684773134


##########
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java:
##########
@@ -410,8 +410,12 @@ private Socket connectToLeader() throws IOException, 
X509Exception, InterruptedE
             int remainingTimeout;
             long startNanoTime = nanoTime();
 
-            for (int tries = 0; tries < 5 && socket.get() == null; tries++) {
+            for (int tries = 0; tries < 5; tries++) {
                 try {
+                    sock = createSocket();
+                    if (socket.get() == null) {
+                        continue;

Review Comment:
   @PAC-MAJ 
   
   The "continue;" you mentioned handles the case that `sock = createSocket();` 
gives null value. In this case, the abnormal null value comes from some 
internal issue of the zk learner, not from leader's connection issue. Hence 
introducing the backoff delay leaderConnectDelayDuringRetryMs will not make 
sense, because that delay should focus on coordinating learn-leader connection 
frequency.
   
   In a broader sense, the loop `for (int tries = 0; tries < 5; tries++)` is 
for the retry of creating socket AND connecting leader. Maybe this logic could 
be decoupled as two phases? For example, write a retry for loop for the 
leaderConnectDelayDuringRetryMs backoff delay, after successfully creating the 
socket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to