ctubbsii commented on code in PR #2990:
URL: https://github.com/apache/accumulo/pull/2990#discussion_r988329538
##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -225,6 +230,40 @@ public boolean stillManager() {
return getManagerState() != ManagerState.STOP;
}
+ /**
+ * Retrieve the Fate object, blocking until it is ready. This could cause
problems if Fate
+ * operations are attempted to be used prior to the Manager being ready for
them. If these
+ * operations are triggered by a client side request from a tserver or
client, it should be safe
+ * to wait to handle those until Fate is ready, but if it occurs during an
upgrade, or some other
+ * time in the Manager before Fate is started, that may result in a deadlock
and will need to be
+ * fixed.
+ *
+ * @return the Fate object, only after the fate components are running and
ready
+ */
+ Fate<Manager> fate() {
+ try {
+ // block up to 30 seconds until it's ready; if it's still not ready,
introduce some logging
+ if (!fateReadyLatch.await(30, TimeUnit.SECONDS)) {
+ String msgPrefix = "Unexpected use of fate in thread " +
Thread.currentThread().getName()
+ + " at time " + System.currentTimeMillis();
+ // include stack trace so we know where it's coming from, in case we
need to troubleshoot it
+ log.warn("{} blocked until fate starts", msgPrefix,
+ new IllegalStateException("Attempted fate action before manager
finished starting up; "
+ + "if this doesn't make progress, please report it as a bug to
the developers"));
+ int minutes = 0;
+ while (!fateReadyLatch.await(5, TimeUnit.MINUTES)) {
+ minutes += 5;
+ log.warn("{} still blocked after {} minutes; this is getting weird",
msgPrefix, minutes);
+ }
+ log.debug("{} no longer blocked", msgPrefix);
+ }
Review Comment:
I think catching the exception, and throwing something the client code
should handle differently is going to be a lot bigger of a change, involving
updating the RPC method signatures, and adding client side code to handle it
differently than the current loop.
I would prefer just giving it a bit of time, and waiting for the object to
be ready, with some generous logging, as this PR is doing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]