ctubbsii commented on pull request #1803:
URL: https://github.com/apache/accumulo/pull/1803#issuecomment-761136440


   > I think using the name of the tserver as the unique sub directory under 
the WAL directory would work. As an added benefit, when the RecoveryManager 
sees that dir at the beginning of `recoveryLogs` it could also check if that 
tserver is still alive (coming from the master so this may be as simple as 
looking in a list of tservers). It would still be possible that the tserver is 
alive and the thread died/hung or it was restarted so that sort process is 
gone. But we could print warnings for this case and start another sort either 
way.
   
   The hiccup I can see with this is that I don't know what "name of the 
tserver" is. The most common identifier for a tserver is the hostname and port, 
but these are not necessarily unique (tservers fail and restart on the same 
host/port). I was actually just thinking about the tserver generating a unique 
ID internally when it starts, that can be used to make unique directories. The 
unique ID could be logged, or could be sent in hello messages to the manager 
server, or stored in Zookeeper locks, for additional uses, but for this issue, 
it would be enough to just log the directory that it created to be used for 
recovery.
   
   Once the uniqueness issue is ironed out, with unique names, the 
SimpleGarbageCollector still needs to be updated and tested to ensure it works 
properly, cleaning up the old directories. I mocked up a patch above for that, 
but it has not been tested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to