ctubbsii commented on pull request #1803: URL: https://github.com/apache/accumulo/pull/1803#issuecomment-761136440
> I think using the name of the tserver as the unique sub directory under the WAL directory would work. As an added benefit, when the RecoveryManager sees that dir at the beginning of `recoveryLogs` it could also check if that tserver is still alive (coming from the master so this may be as simple as looking in a list of tservers). It would still be possible that the tserver is alive and the thread died/hung or it was restarted so that sort process is gone. But we could print warnings for this case and start another sort either way. The hiccup I can see with this is that I don't know what "name of the tserver" is. The most common identifier for a tserver is the hostname and port, but these are not necessarily unique (tservers fail and restart on the same host/port). I was actually just thinking about the tserver generating a unique ID internally when it starts, that can be used to make unique directories. The unique ID could be logged, or could be sent in hello messages to the manager server, or stored in Zookeeper locks, for additional uses, but for this issue, it would be enough to just log the directory that it created to be used for recovery. Once the uniqueness issue is ironed out, with unique names, the SimpleGarbageCollector still needs to be updated and tested to ensure it works properly, cleaning up the old directories. I mocked up a patch above for that, but it has not been tested. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
