[
https://issues.apache.org/jira/browse/CURATOR-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833444#comment-13833444
]
Orcun Simsek commented on CURATOR-79:
-------------------------------------
We've run into this a few times in production, and our live workaround is to
kill the offending ZK session. We're currently looking to suppress the cause of
the interrupts, but are concerned that 1) we may not own all sources of the
interruption 2) that this deadlock can occur with any non-KeeperException, not
just an InterruptedException. A fix will be much appreciated, and we'll try to
put up a working patch as soon as possible.
> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>
> Key: CURATOR-79
> URL: https://issues.apache.org/jira/browse/CURATOR-79
> Project: Apache Curator
> Issue Type: Bug
> Reporter: Orcun Simsek
> Assignee: Jordan Zimmerman
>
> InterProcessMutex can deadlock if a thread is interrupted during acquire().
> Specifically, CreateBuilderImpl.pathInForeground submits a create request to
> ZooKeeper, and an InterruptedException is thrown after the node is created in
> ZK but before ZK.create returns. ZK.create propagates a non-KeeperException,
> so Curator assumes the create has failed, but does not retry, and the node is
> now orphaned. At some point in the future, the node becomes the next in the
> acquisition sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions,
> and delete the created node before propagating these exceptions.
> (as originally discussed on
> https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)
--
This message was sent by Atlassian JIRA
(v6.1#6144)