EdColeman commented on PR #4562: URL: https://github.com/apache/accumulo/pull/4562#issuecomment-2113508506
I started to comment on the loop where the lock data was read in the loop from `getChildren` with the following: ''' Would it be worth it to wrap this call with another try...catch(Keeper.NO_NODE ex) to allow it to handle the case where the ephemeral lock was removed while in the main `getChildren` loop? With no lock node, it could either delete the host:port then - or at least continue processing the other nodes. As is, it will retry, but handling NO_NODE could make it more responsive by processing the remaining nodes in the list. ``` Taking no action and allowing that node to be processed on the next try would be safer. But, could there be a race condition between the server lock code and this cleaner. If the server lock creates the host:port node and then writes the lock there will a period where the lock does not exist, but host:port is expected to be there. What would happen if the cleaner deletes the host:port and then the server lock write is attempted? It may be possible to use the creation time of the host:port node (ZK stat ctime) and check that it is older than the loop retry period. This would delay the removal for at least one cleaner cycle. Or, the service lock code could try to recreate the host:port node. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
