[ 
https://issues.apache.org/jira/browse/CURATOR-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090347#comment-14090347
 ] 

Cameron McKenzie commented on CURATOR-79:
-----------------------------------------

So, a fix for this is more complicated than I was hoping.

The recipes already use protected ephemeral nodes, so for the normal case where 
you attempt to create a node and lose the connection after you've submitted the 
create request but before you've got a response is covered. The problem is that 
this only handles the ConnectionLoss exception. If any other type of exception 
occurs then the logic to remove the potentially created ephemeral node does not 
fire.

It is possible, but a bit messy to handle this at the LockInternals level. In 
the case of getting an exception while trying to create the zNode, we can try 
and remove the potentially created node, but we don't know its name. So, we'd 
need to query all the children of the parent lock path, and then work out which 
ones are ephemeral nodes owned by the current session, and aren't known about 
by the current lock instance (i.e. they are an orphan). I've implemented this, 
but it's a bit messy and requires changes to the clients of LockInternals.

So, I think this needs some more thought. Perhaps the logic in the protected 
node handling can be extended to fire on any non KeeperException (other than 
ConnectionLoss). Any thoughts from anyone else?

> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>
>                 Key: CURATOR-79
>                 URL: https://issues.apache.org/jira/browse/CURATOR-79
>             Project: Apache Curator
>          Issue Type: Bug
>    Affects Versions: 2.0.0-incubating, 2.1.0-incubating, 2.2.0-incubating, 
> 2.3.0
>            Reporter: Orcun Simsek
>            Assignee: Jordan Zimmerman
>
> InterProcessMutex can deadlock if a thread is interrupted during acquire().  
> Specifically, CreateBuilderImpl.pathInForeground submits a create request to 
> ZooKeeper, and an InterruptedException is thrown after the node is created in 
> ZK but before ZK.create returns. ZK.create propagates a non-KeeperException, 
> so Curator assumes the create has failed, but does not retry, and the node is 
> now orphaned. At some point in the future, the node becomes the next in the 
> acquisition sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions, 
> and delete the created node before propagating these exceptions.
> (as originally discussed on 
> https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to