dlmarion commented on PR #4562:
URL: https://github.com/apache/accumulo/pull/4562#issuecomment-2115112341

   > I started to comment on the loop where the lock data was read in the loop 
from `getChildren` with the following:
   > 
   > > Would it be worth it to wrap this call with another 
try...catch(Keeper.NO_NODE ex) to allow it to handle the case where the 
ephemeral lock was removed while in the main `getChildren` loop?  With no lock 
node, it could either delete the host:port then - or at least continue 
processing the other nodes.  As is, it will retry, but handling NO_NODE could  
make it more responsive by processing the remaining nodes in the list.
   > 
   > Taking no action and allowing that node to be processed on the next try 
would be safer.
   > 
   > But, could there be a race condition between the server lock code and this 
cleaner. If the server lock creates the host:port node and then writes the lock 
there will a period where the lock does not exist, but host:port is expected to 
be there. What would happen if the cleaner deletes the host:port and then the 
server lock write is attempted?
   > 
   > It may be possible to use the creation time of the host:port node (ZK stat 
ctime) and check that it is older than the loop retry period. This would delay 
the removal for at least one cleaner cycle. Or, the service lock code could try 
to recreate the host:port node.
   
   Are you suggesting wrapping the following line with a try/catch to catch 
Keeper.NO_NODE? 
   ```
               byte[] lockData = 
ServiceLock.getLockData(getContext().getZooCache(), zLockPath, stat);
   ```
   
   I don't think that method throws that Exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to