EdColeman commented on issue #3138:
URL: https://github.com/apache/accumulo/issues/3138#issuecomment-1410935644

   Another dimension to this might occur if the dead server has just enough 
functionality to keep the ZooKeeper connection from timing out but otherwise 
unable to fully receive / respond to ZooKeeper events.
   
   What would "happen" is the zoo lock is deleted, which should force the 
tserver to stop hosting its tablets.  The manager sees the tables unassigned, 
and assigns them to the another tserver.  If the original tserver does not 
realize that it should not be hosting the tablets then both the original and 
the new server are serving the same tablets - which we make assumptions that it 
will never happen.
   
   There is an IT test, HalfDeadITServer that tries to test some of this, but 
not sure how much it actually covers.  And I recall past attempts to mock / 
wrap an ZooKeeper client to inject various errors, but I am unsure how far they 
progressed. 
   
   Most of this may be outside of this issue (if the Fate command is 
insufficient) - but there may be other issues that should be looked at.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to