EdColeman commented on issue #1689:
URL: https://github.com/apache/accumulo/issues/1689#issuecomment-694500736


   (Using 1.10 code) when the tserver gets into a bad state, it looks like 
zooCache may be returning null in the Tables.exists() check (Tables - line 
147).  
   
   In TabletServerResourceManager - line 451 has a catch throwable and just a 
log statement.  The code is in a continuous loop and I believe the code after 
the error is correctly guarded, but the loop never will end.
   
   I don't think that killing the runnable would work - the tserver might never 
notice it lost the memory manager thread.  
   
   I think zookeeper is available - how bad would it be if on catching the 
exception, it just deleted the tablet server lock and thereby killed the 
server?  That would be preferable to writing corrupt data, but maybe there are 
other "recoverable errors"?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to