ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543628709
##########
File path:
core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
* specific language governing permissions and limitations
* under the License.
*/
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
import java.lang.Thread.UncaughtExceptionHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
-public class AccumuloUncaughtExceptionHandler implements
UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a
Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
- private static final Logger log =
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+ private static final Logger LOG =
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
@Override
public void uncaughtException(Thread t, Throwable e) {
- log.error(String.format("Caught an exception in %s. Shutting down.", t),
e);
+ if (e instanceof Exception) {
+ LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+ } else if (e instanceof Error) {
+ try {
+ e.printStackTrace();
+ System.err.println("Error thrown in thread: " + t + ", halting VM.");
+ } catch (Throwable e1) {
+ // If e == OutOfMemoryError, then it's probably that another Error
might be
+ // thrown when trying to print to System.err.
+ } finally {
+ Runtime.getRuntime().halt(-1);
Review comment:
I think we can't trust the TServer to shut down gracefully if we get to
this state, and adding code to categorize and handle different threads each
individually, would add too much (unmaintainable) complexity that would be
prone to accruing technical debt and drifting from the original intent over
time. I think it's simpler and safer to just stop everything in the TServer.
Yes, it prevents some graceful shutdown steps from occurring... but it also
prevents it from causing harm that is somewhat unpredictable. Rather than try
to write protective shutdown code in uncertain emergency circumstances, our
code complexity and development effort would be better spent on ensuring robust
recovery, in my opinion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]