EdColeman commented on issue #1689:
URL: https://github.com/apache/accumulo/issues/1689#issuecomment-696434005


   Overall, there may be a difference in philosophy or focus.  I do not 
disagree with the premises of anything that you wrote - except I think I'm 
coming at it from a different angle.
   
   I'm aware of the classloader rework - that's one reason I was less focused 
on the "how it got that way" and am trying to address "no matter how we got 
here, can we at least stop writing corrupt files".
   
   Whatever the root cause, we should try to protect ourselves, and if we can't 
recover, then at least stop from corrupting data.  It seems very likely as this 
has unfolded that things are pointing to something external to Accumulo. But, I 
think its a bug that Accumulo keeps working (and working incorrectly).  It 
should be a given that the hardware works - but it is impossible to provide 
that guarantee - things go wrong.
   
   I agree that catching Throwable may not always be appropriate - in this case 
it is not - so, for this one case, is there an acceptable solution?  I've 
proposed one way.  
   
   A second, and more general way could be to leverage 
`Thread.UncaughtExceptionHandler` - using that, the tserver could create the 
threads and assign a handler that would do essentially the same thing - stop 
the tserver, either by deleting the lock or whatever the preferred mechanism 
is, if the underlying "critical" thread dies.  The we don't need to guard 
against unexpected exceptions - we let them kill the thread - and then decide 
to either kill the tserver or maybe spawn a new thread - if that could be 
determined to be appropriate and safe.  
   
   1) So, in general - for cases where we are catching `Throwable ` and that is 
causing issues, - would it be better if we stopped the tserver?
   
   2) If it is determined that we want to stop, is deleting the lock 
acceptable, or is there a preferred, alternate method.
   
   While the general issue of catching and swallowing Throwable is a bigger 
issue - for this one case where we can identify a case that this is not 
appropriate - can we fix that and then examine the larger issue as time allows 
or when other cases are identified?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to