[GitHub] [accumulo] ddanielr commented on a diff in pull request #3231: Fix wait timeout logic for available tservers

via GitHub Wed, 08 Mar 2023 23:14:35 -0800


ddanielr commented on code in PR #3231:
URL: https://github.com/apache/accumulo/pull/3231#discussion_r1130571661



##########
server/master/src/main/java/org/apache/accumulo/master/Master.java:
##########
@@ -1498,51 +1498,47 @@ private void blockForTservers() throws 
InterruptedException {
           Property.MASTER_STARTUP_TSERVER_AVAIL_MAX_WAIT.getKey());
       maxWait = Long.MAX_VALUE;
     }
+    long sleepInterval = maxWait / 10;
 
-    // honor Retry condition that initial wait < max wait, otherwise use small 
value to allow thread
-    // yield to happen
-    long initialWait = Math.min(50, maxWait / 2);
-
-    Retry tserverRetry =
-        Retry.builder().infiniteRetries().retryAfter(initialWait, 
TimeUnit.MILLISECONDS)
-            .incrementBy(15_000, TimeUnit.MILLISECONDS).maxWait(maxWait, 
TimeUnit.MILLISECONDS)
-            .logInterval(30_000, TimeUnit.MILLISECONDS).createRetry();
+    // Set a incremental logging delay
+    long logIncrement = 15_000;
+    long logWait = 0, lastLog = 0;
 
     log.info("Checking for tserver availability - need to reach {} servers. 
Have {}",
         minTserverCount, tserverSet.size());
 
     boolean needTservers = tserverSet.size() < minTserverCount;
 
-    while (needTservers && tserverRetry.canRetry()) {
-
-      tserverRetry.waitForNextAttempt();
-
+    while (needTservers && ((System.currentTimeMillis() - waitStart) < 
maxWait)) {
       needTservers = tserverSet.size() < minTserverCount;
 
-      // suppress last message once threshold reached.
-      if (needTservers) {
+      // Determine when to log a message
+      if (needTservers && ((System.currentTimeMillis() - lastLog) > logWait)) {
         log.info(
             "Blocking for tserver availability - need to reach {} servers. 
Have {}"
                 + " Time spent blocking {} sec.",
             minTserverCount, tserverSet.size(),
             TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis() - 
waitStart));
+        lastLog = System.currentTimeMillis();
+        logWait = logWait + logIncrement;
       }
     }
 
     if (tserverSet.size() < minTserverCount) {
       log.warn(
           "tserver availability check time expired - continuing. Requested {}, 
have {} tservers on line. "
-              + " Time waiting {} ms",
+              + " Time waiting {} sec",
           tserverSet.size(), minTserverCount,
           TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis() - 
waitStart));
 
     } else {
       log.info(
           "tserver availability check completed. Requested {}, have {} 
tservers on line. "
-              + " Time waiting {} ms",
+              + " Time waiting {} sec",
           tserverSet.size(), minTserverCount,
           TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis() - 
waitStart));
     }
+    sleepUninterruptibly(sleepInterval, TimeUnit.MILLISECONDS);

Review Comment:
   I had originally chosen an uninterruptible sleep call as there are 8 other 
instances of that method being called in this class without checking for 
interrupt status, but I'm indifferent on this and happy to switch it over to a 
normal sleep with interrupt. 
   
   Since `blockForTservers` is called from the main thread, do we need to 
handle the interrupt status via a try/catch block in the original `run` method? 
   
   Main creates a new master object and calls run
   
https://github.com/apache/accumulo/blob/da2b7ed883a8e99c733fc557032b1a45544cddf1/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1648-L1671
   
   Run calls `blockForTservers` in the same thread
   
https://github.com/apache/accumulo/blob/da2b7ed883a8e99c733fc557032b1a45544cddf1/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1258-L1275



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] ddanielr commented on a diff in pull request #3231: Fix wait timeout logic for available tservers

Reply via email to