dlmarion commented on PR #4562:
URL: https://github.com/apache/accumulo/pull/4562#issuecomment-2115112341
> I started to comment on the loop where the lock data was read in the loop
from `getChildren` with the following:
>
> > Would it be worth it to wrap this call with another
try...catch(Keeper.NO_NODE ex) to allow it to handle the case where the
ephemeral lock was removed while in the main `getChildren` loop? With no lock
node, it could either delete the host:port then - or at least continue
processing the other nodes. As is, it will retry, but handling NO_NODE could
make it more responsive by processing the remaining nodes in the list.
>
> Taking no action and allowing that node to be processed on the next try
would be safer.
>
> But, could there be a race condition between the server lock code and this
cleaner. If the server lock creates the host:port node and then writes the lock
there will a period where the lock does not exist, but host:port is expected to
be there. What would happen if the cleaner deletes the host:port and then the
server lock write is attempted?
>
> It may be possible to use the creation time of the host:port node (ZK stat
ctime) and check that it is older than the loop retry period. This would delay
the removal for at least one cleaner cycle. Or, the service lock code could try
to recreate the host:port node.
Are you suggesting wrapping the following line with a try/catch to catch
Keeper.NO_NODE?
```
byte[] lockData =
ServiceLock.getLockData(getContext().getZooCache(), zLockPath, stat);
```
I don't think that method throws that Exception.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]