ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r543628709



##########
File path: 
core/src/main/java/org/apache/accumulo/core/util/threads/AccumuloUncaughtExceptionHandler.java
##########
@@ -16,20 +16,36 @@
  * specific language governing permissions and limitations
  * under the License.
  */
-package org.apache.accumulo.core.util;
+package org.apache.accumulo.core.util.threads;
 
 import java.lang.Thread.UncaughtExceptionHandler;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-public class AccumuloUncaughtExceptionHandler implements 
UncaughtExceptionHandler {
+/**
+ * UncaughtExceptionHandler that logs all Exceptions and Errors thrown from a 
Thread. If an Error is
+ * thrown, halt the JVM.
+ *
+ */
+class AccumuloUncaughtExceptionHandler implements UncaughtExceptionHandler {
 
-  private static final Logger log = 
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+  private static final Logger LOG = 
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
 
   @Override
   public void uncaughtException(Thread t, Throwable e) {
-    log.error(String.format("Caught an exception in %s.  Shutting down.", t), 
e);
+    if (e instanceof Exception) {
+      LOG.error("Caught an Exception in {}. Thread is dead.", t, e);
+    } else if (e instanceof Error) {
+      try {
+        e.printStackTrace();
+        System.err.println("Error thrown in thread: " + t + ", halting VM.");
+      } catch (Throwable e1) {
+        // If e == OutOfMemoryError, then it's probably that another Error 
might be
+        // thrown when trying to print to System.err.
+      } finally {
+        Runtime.getRuntime().halt(-1);

Review comment:
       I think we can't trust the TServer to shut down gracefully if we get to 
this state, and adding code to categorize and handle different threads each 
individually, would add too much (unmaintainable) complexity that would be 
prone to accruing technical debt and drifting from the original intent over 
time. I think it's simpler and safer to just stop everything in the TServer. 
Yes, it prevents some graceful shutdown steps from occurring... but it also 
prevents it from causing harm that is somewhat unpredictable. Rather than try 
to write protective shutdown code in uncertain emergency circumstances, our 
code complexity and development effort would be better spent on ensuring robust 
recovery, in my opinion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to