[ 
https://issues.apache.org/jira/browse/CURATOR-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850200#comment-16850200
 ] 

Cameron McKenzie commented on CURATOR-525:
------------------------------------------

[~mikhailvaliev], do you happen to have a self contained test that can 
reproduce this issue?

> There is a race condition in Curator which might lead to fake SUSPENDED event 
> and ruin CuratorFrameworkImpl inner state 
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-525
>                 URL: https://issues.apache.org/jira/browse/CURATOR-525
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.2.0
>            Reporter: Mikhail Valiev
>            Priority: Critical
>         Attachments: CuratorFrameworkTest.java, 
> background-thread-infinite-loop.png, curator-race-condition.png, 
> event-watcher-thread.png
>
>
> This was originally found in the 2.11.1 version of Curator, but I tested the 
> latest release as well, and the issue is still there.
> The issue is tied to guaranteed deletes and how it loops infinitely, if 
> called when there is no connection:
> client.delete().guaranteed().forPath(ourPath); 
> [https://curator.apache.org/apidocs/org/apache/curator/framework/api/GuaranteeableDeletable.html]
> This schedules a background operation which attempts to remove the node in 
> infinite loop. Each time a background operation fails due to connection loss 
> it performs a check (validateConnection() function) to see if the main thread 
> is already aware of connection loss, and if it's not - raises the connection 
> loss event. The problem is that this peace of code is also executed by the 
> event watcher thread when connection events are happening - this leads to 
> race condition. So when connection is restored it's easily possible for the 
> main thread to raise RECONNECTED event and after that for background thread 
> to raise SUSPENDED event.
> We might get unlucky and get a "phantom" SUSPENDED event. It breaks Curator 
> inner Connection state and leads to curator behaving unpredictably
> Attached some illustrations and Unit test to reproduce the issue. (Put debug 
> point in validateConnection() )
> *Possible solution*: in CuratorFrameworkImpl class adjust the processEvent() 
> function and add the following:
> if(event.getType() == CuratorEventType.SYNC) {
> connectionStateManager.addStateChange(ConnectionState.RECONNECTED);
> }
> If this is a same state as before - it will be ignored, if background 
> operation succeeded, but we are in SUSPENDED state - this would repair the 
> Curator state and raise RECONNECTED event.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to