ctubbsii commented on a change in pull request #1818:
URL: https://github.com/apache/accumulo/pull/1818#discussion_r532883229
##########
File path:
core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
public class AccumuloUncaughtExceptionHandler implements
UncaughtExceptionHandler {
private static final Logger log =
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+ private static final String HALT_PROPERTY = "HaltVMOnThreadError";
@Override
public void uncaughtException(Thread t, Throwable e) {
- log.error(String.format("Caught an exception in %s. Shutting down.", t),
e);
+ if (e instanceof Exception) {
+ log.error(String.format("Caught an exception in %s. Thread is dead.",
t), e);
Review comment:
```suggestion
log.error("Caught an exception in {}. Thread is dead.", t, e);
```
##########
File path: server/manager/src/main/java/org/apache/accumulo/master/Master.java
##########
@@ -884,8 +886,9 @@ private long balanceTablets() {
gatherTableInformation(Set<TServerInstance> currentServers) {
final long rpcTimeout =
getConfiguration().getTimeInMillis(Property.GENERAL_RPC_TIMEOUT);
int threads =
getConfiguration().getCount(Property.MASTER_STATUS_THREAD_POOL_SIZE);
- ExecutorService tp =
- threads == 0 ? Executors.newCachedThreadPool() :
Executors.newFixedThreadPool(threads);
+ ExecutorService tp = threads == 0
+ ? Executors.newCachedThreadPool(new
NamingThreadFactory("GatherTableInformation"))
+ : Executors.newFixedThreadPool(threads, new
NamingThreadFactory("GatherTableInformation"));
Review comment:
This is probably a little easier to read if you assign the ThreadFactory
to a variable first, like the following (formatting probably isn't right):
```suggestion
var threadFactory = new NamingThreadFactory("GatherTableInformation");
ExecutorService tp = threads == 0
? Executors.newCachedThreadPool(threadFactory)
: Executors.newFixedThreadPool(threads, threadFactory);
```
##########
File path:
core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
public class AccumuloUncaughtExceptionHandler implements
UncaughtExceptionHandler {
private static final Logger log =
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+ private static final String HALT_PROPERTY = "HaltVMOnThreadError";
@Override
public void uncaughtException(Thread t, Throwable e) {
- log.error(String.format("Caught an exception in %s. Shutting down.", t),
e);
+ if (e instanceof Exception) {
+ log.error(String.format("Caught an exception in %s. Thread is dead.",
t), e);
+ } else {
+ if (System.getProperty(HALT_PROPERTY, "false").equals("true")) {
+ log.error(String.format("Caught an exception in %s.", t), e);
Review comment:
We don't need String.format here. The slf4j logger supports substitution
with `{}`
```suggestion
log.error("Caught an exception in {}.", t, e);
```
##########
File path: assemble/conf/accumulo-env.sh
##########
@@ -95,7 +95,7 @@ JAVA_OPTS=("${JAVA_OPTS[@]}"
case "$cmd" in
monitor|gc|master|tserver|tracer)
- JAVA_OPTS=("${JAVA_OPTS[@]}"
"-Dlog4j.configurationFile=log4j2-service.properties")
+ JAVA_OPTS=("${JAVA_OPTS[@]}"
"-Dlog4j.configurationFile=log4j2-service.properties"
"-DHaltVMOnThreadError=true")
Review comment:
Interesting strategy. I wonder if the way you're handling this bypasses
our setting for `'-XX:OnOutOfMemoryError=kill -9 %p'` in these threads.
##########
File path:
server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleCriticalTimer.java
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.server.util.time;
+
+import java.lang.Thread.UncaughtExceptionHandler;
+
+import org.apache.accumulo.core.conf.AccumuloConfiguration;
+import org.apache.accumulo.core.conf.Property;
+import org.apache.accumulo.core.util.AccumuloUncaughtExceptionHandler;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Generic singleton timer for critical tasks.
+ */
+public class SimpleCriticalTimer extends SimpleTimer {
Review comment:
What differentiates this from `SimpleTimer`? Could improve the javadoc
here with a high-level explanation.
##########
File path:
core/src/main/java/org/apache/accumulo/core/util/AccumuloUncaughtExceptionHandler.java
##########
@@ -26,10 +26,20 @@
public class AccumuloUncaughtExceptionHandler implements
UncaughtExceptionHandler {
private static final Logger log =
LoggerFactory.getLogger(AccumuloUncaughtExceptionHandler.class);
+ private static final String HALT_PROPERTY = "HaltVMOnThreadError";
@Override
public void uncaughtException(Thread t, Throwable e) {
- log.error(String.format("Caught an exception in %s. Shutting down.", t),
e);
+ if (e instanceof Exception) {
+ log.error(String.format("Caught an exception in %s. Thread is dead.",
t), e);
+ } else {
+ if (System.getProperty(HALT_PROPERTY, "false").equals("true")) {
+ log.error(String.format("Caught an exception in %s.", t), e);
+ Halt.halt(String.format("Caught an exception in %s. Halting VM, check
the logs.", t));
+ } else {
+ log.error(String.format("Caught an exception in %s. Thread is dead.",
t), e);
Review comment:
```suggestion
log.error("Caught an exception in {}. Thread is dead.", t, e);
```
##########
File path:
server/base/src/main/java/org/apache/accumulo/server/util/time/SimpleTimer.java
##########
@@ -100,10 +93,19 @@ static int getInstanceThreadPoolSize() {
return instanceThreadPoolSize;
}
- private SimpleTimer(int threadPoolSize) {
+ protected SimpleTimer(int threadPoolSize) {
executor = Executors.newScheduledThreadPool(threadPoolSize,
new
ThreadFactoryBuilder().setNameFormat("SimpleTimer-%d").setDaemon(true)
- .setUncaughtExceptionHandler(new ExceptionHandler()).build());
+
.setUncaughtExceptionHandler(getUncaughtExceptionHandler()).build());
+ }
+
+ protected Thread.UncaughtExceptionHandler getUncaughtExceptionHandler() {
+ return new Thread.UncaughtExceptionHandler() {
+ @Override
+ public void uncaughtException(Thread t, Throwable e) {
+ log.warn("SimpleTimer task failed", e);
+ }
+ };
Review comment:
I'm curious what non-critical use cases we still have for this, that
would warrant continuing to swallow OOM errors and such.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]