ctubbsii commented on code in PR #2990:
URL: https://github.com/apache/accumulo/pull/2990#discussion_r988329538


##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -225,6 +230,40 @@ public boolean stillManager() {
     return getManagerState() != ManagerState.STOP;
   }
 
+  /**
+   * Retrieve the Fate object, blocking until it is ready. This could cause 
problems if Fate
+   * operations are attempted to be used prior to the Manager being ready for 
them. If these
+   * operations are triggered by a client side request from a tserver or 
client, it should be safe
+   * to wait to handle those until Fate is ready, but if it occurs during an 
upgrade, or some other
+   * time in the Manager before Fate is started, that may result in a deadlock 
and will need to be
+   * fixed.
+   *
+   * @return the Fate object, only after the fate components are running and 
ready
+   */
+  Fate<Manager> fate() {
+    try {
+      // block up to 30 seconds until it's ready; if it's still not ready, 
introduce some logging
+      if (!fateReadyLatch.await(30, TimeUnit.SECONDS)) {
+        String msgPrefix = "Unexpected use of fate in thread " + 
Thread.currentThread().getName()
+            + " at time " + System.currentTimeMillis();
+        // include stack trace so we know where it's coming from, in case we 
need to troubleshoot it
+        log.warn("{} blocked until fate starts", msgPrefix,
+            new IllegalStateException("Attempted fate action before manager 
finished starting up; "
+                + "if this doesn't make progress, please report it as a bug to 
the developers"));
+        int minutes = 0;
+        while (!fateReadyLatch.await(5, TimeUnit.MINUTES)) {
+          minutes += 5;
+          log.warn("{} still blocked after {} minutes; this is getting weird", 
msgPrefix, minutes);
+        }
+        log.debug("{} no longer blocked", msgPrefix);
+      }

Review Comment:
   I think catching the exception, and throwing something the client code 
should handle differently is going to be a lot bigger of a change, involving 
updating the RPC method signatures, and adding client side code to handle it 
differently than the current loop.
   
   I would prefer just giving it a bit of time, and waiting for the object to 
be ready, with some generous logging, as this PR is doing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to