[
https://issues.apache.org/jira/browse/CURATOR-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen Ingram updated CURATOR-205:
-----------------------------------
Summary: Repeated InterruptedExceptions during mutex acquire leads to
LeaderSelector deadlock (was: Repeated InterruptedExceptions during mutex
aquire leads to LeaderSelector deadlock)
> Repeated InterruptedExceptions during mutex acquire leads to LeaderSelector
> deadlock
> ------------------------------------------------------------------------------------
>
> Key: CURATOR-205
> URL: https://issues.apache.org/jira/browse/CURATOR-205
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 2.7.2
> Reporter: Stephen Ingram
>
> When an InterruptedException is thrown during the internalLockLoop that is
> called during mutex.acquire, internalLockLoop will set a flag "doDelete"
> which signals during a finally clause to delete the lock path that we are
> trying to create.
> However, in the pathInForeground function of DeleteBuilderImpl, a _second_
> InterruptedException may occur before zookeeper can delete the specified
> path. The RetryLoop machinery contained in the function will only retry if
> it is a Retryable Exception, an equivalence class which does not include
> InterruptedExceptions.
> The second InterruptedException exception then causes an exit of the
> pathInForeground function without deleting the path, leading to a deadlock
> where no one can acquire the mutex.
> In my test, I am certain that both of these InterruptedExceptions are due to
> repeated fluctuation in the ConnectionStateManager's connection state. When
> the state ceases to fluctuate, no leader can be selected due to the
> persistence of the node we failed to delete.
> I was able to address this bug with a solution similar to CURATOR-45: if the
> pathInForeground function is interrupted with an InterruptedException, I
> schedule a BackgroundCallback to attempt pathInForeground again. This task
> is able to delete the path when the connection is stable and the mutex is
> acquired by the new leader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)