[
https://issues.apache.org/jira/browse/CURATOR-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Orcun Simsek updated CURATOR-79:
--------------------------------
Description:
InterProcessMutex can deadlock if a thread is interrupted during acquire().
Specifically, CreateBuilderImpl.pathInForeground submits a create request to
ZooKeeper, and an InterruptedException is thrown after the node is created in
ZK but before ZK.create returns. ZK.create propagates a non-KeeperException, so
Curator assumes the create has failed, but does not retry, and the node is now
orphaned. At some point in the future, the node becomes the next in the
acquisition sequence, but is not reclaimed as the ZK session has not expired.
<stack trace attached in comments below>
Curator should catch the InterruptedException and other non-KeeperExceptions,
and delete the created node before propagating these exceptions.
(as originally discussed on
https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)
was:
InterProcessMutex can deadlock if a thread is interrupted during acquire().
Specifically, CreateBuilderImpl.pathInForeground submits a create request to
ZooKeeper, and an InterruptedException is thrown after the node is created in
ZK but before ZK.create returns. ZK.create propagates a non-KeeperException, so
Curator assumes the create has failed, but does not retry, and the node is now
orphaned. At some point in the future, the node becomes the next in the
acquisition sequence, but is not reclaimed as the ZK session has not expired.
{code}
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:781)
at
com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:625)
at
com.netflix.curator.framework.imps.CreateBuilderImpl$10.call(CreateBuilderImpl.java:609)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
at
com.netflix.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:605)
at
com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:428)
at
com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:408)
at
com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:41)
at
com.netflix.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:222)
at
com.netflix.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:218)
at
com.netflix.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:74)
at
com.palantir.finance.server.service.storage.CuratorLockTests.testInterruptDeadlock(CuratorLockTests.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
{code}
Curator should catch the InterruptedException and other non-KeeperExceptions,
and delete the created node before propagating these exceptions.
(as originally discussed on
https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)
> InterProcessMutex doesn't clean up after interrupt
> --------------------------------------------------
>
> Key: CURATOR-79
> URL: https://issues.apache.org/jira/browse/CURATOR-79
> Project: Apache Curator
> Issue Type: Bug
> Reporter: Orcun Simsek
> Assignee: Jordan Zimmerman
>
> InterProcessMutex can deadlock if a thread is interrupted during acquire().
> Specifically, CreateBuilderImpl.pathInForeground submits a create request to
> ZooKeeper, and an InterruptedException is thrown after the node is created in
> ZK but before ZK.create returns. ZK.create propagates a non-KeeperException,
> so Curator assumes the create has failed, but does not retry, and the node is
> now orphaned. At some point in the future, the node becomes the next in the
> acquisition sequence, but is not reclaimed as the ZK session has not expired.
> <stack trace attached in comments below>
> Curator should catch the InterruptedException and other non-KeeperExceptions,
> and delete the created node before propagating these exceptions.
> (as originally discussed on
> https://groups.google.com/forum/#!topic/curator-users/9ii5of8SbdQ)
--
This message was sent by Atlassian JIRA
(v6.1#6144)