keith-turner commented on a change in pull request #1646:
URL: https://github.com/apache/accumulo/pull/1646#discussion_r448499885
##########
File path:
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactor.java
##########
@@ -148,6 +149,13 @@ public CompactionStats call() {
reportedProblem = true;
} catch (CompactionCanceledException e) {
throw new IllegalStateException(e);
+ } catch (Throwable t) {
Review comment:
> Now I am thinking that a Halt just does not feel right, and I fear
that given the appropriate circumstances could result in a cascade of tserver
deaths.
That could happen. Part of the problem is that some errors are benign and
others are likely an indication of a catastrophe. Maybe configuration is the
best option, could have three configuration items.
* Configurable list of errors to retry on.
* Configurable list of errors to halt on.
* Action to take for errors that fall in neither list : hang, retry, halt.
This would allow when a new error is encountered like in #1644 that the
error class could just be added to config for retry.
> It could be that all of these changes go away and I simply catch the
NoClassDefFoundError and call it a day.
Maybe a follow on issue would be best. I feel like this is a wider issue,
because any thread could encounter an error and having a single mechanism for
handling unexpected errors in server processes seems useful.
I am also wondering if emitting metrics for errors would be useful. If the
cardinality is deemed low enough, could emit metrics for each error class name.
Then in the metrics system could see when a tserver has an out of memory
error. This would be a follow on issue if it seems like something that might
be useful.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]