[jira] [Updated] (CURATOR-205) Repeated InterruptedExceptions during mutex acquire leads to LeaderSelector deadlock

Stephen Ingram (JIRA) Wed, 08 Apr 2015 11:34:29 -0700

     [ 
https://issues.apache.org/jira/browse/CURATOR-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen Ingram updated CURATOR-205:
-----------------------------------
    Summary: Repeated InterruptedExceptions during mutex acquire leads to 
LeaderSelector deadlock  (was: Repeated InterruptedExceptions during mutex 
aquire leads to LeaderSelector deadlock)

> Repeated InterruptedExceptions during mutex acquire leads to LeaderSelector 
> deadlock
> ------------------------------------------------------------------------------------
>
>                 Key: CURATOR-205
>                 URL: https://issues.apache.org/jira/browse/CURATOR-205
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 2.7.2
>            Reporter: Stephen Ingram
>
> When an InterruptedException is thrown during the internalLockLoop that is 
> called during mutex.acquire, internalLockLoop will set a flag "doDelete" 
> which signals during a finally clause to delete the lock path that we are 
> trying to create.
> However, in the pathInForeground function of DeleteBuilderImpl, a _second_ 
> InterruptedException may occur before zookeeper can delete the specified 
> path.  The RetryLoop machinery contained in the function will only retry if 
> it is a Retryable Exception, an equivalence class which does not include 
> InterruptedExceptions.  
> The second InterruptedException exception then causes an exit of the 
> pathInForeground function without deleting the path, leading to a deadlock 
> where no one can acquire the mutex.
> In my test, I am certain that both of these InterruptedExceptions are due to 
> repeated fluctuation in the ConnectionStateManager's connection state.  When 
> the state ceases to fluctuate, no leader can be selected due to the 
> persistence of the node we failed to delete.
> I was able to address this bug with a solution similar to CURATOR-45:  if the 
> pathInForeground function is interrupted with an InterruptedException, I 
> schedule a BackgroundCallback to attempt pathInForeground again.  This task 
> is able to delete the path when the connection is stable and the mutex is 
> acquired by the new leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CURATOR-205) Repeated InterruptedExceptions during mutex acquire leads to LeaderSelector deadlock

Reply via email to