[
https://issues.apache.org/jira/browse/CURATOR-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927064#comment-16927064
]
Jordan Zimmerman commented on CURATOR-525:
------------------------------------------
What version of Curator? [~jschlather]
> There is a race condition in Curator which might lead to fake SUSPENDED event
> and ruin CuratorFrameworkImpl inner state
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: CURATOR-525
> URL: https://issues.apache.org/jira/browse/CURATOR-525
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 4.2.0
> Reporter: Mikhail Valiev
> Assignee: Cameron McKenzie
> Priority: Critical
> Attachments: CuratorFrameworkTest.java,
> background-thread-infinite-loop.png, curator-race-condition.png,
> event-watcher-thread.png
>
>
> This was originally found in the 2.11.1 version of Curator, but I tested the
> latest release as well, and the issue is still there.
> The issue is tied to guaranteed deletes and how it loops infinitely, if
> called when there is no connection:
> client.delete().guaranteed().forPath(ourPath);
> [https://curator.apache.org/apidocs/org/apache/curator/framework/api/GuaranteeableDeletable.html]
> This schedules a background operation which attempts to remove the node in
> infinite loop. Each time a background operation fails due to connection loss
> it performs a check (validateConnection() function) to see if the main thread
> is already aware of connection loss, and if it's not - raises the connection
> loss event. The problem is that this peace of code is also executed by the
> event watcher thread when connection events are happening - this leads to
> race condition. So when connection is restored it's easily possible for the
> main thread to raise RECONNECTED event and after that for background thread
> to raise SUSPENDED event.
> We might get unlucky and get a "phantom" SUSPENDED event. It breaks Curator
> inner Connection state and leads to curator behaving unpredictably
> Attached some illustrations and Unit test to reproduce the issue. (Put debug
> point in validateConnection() )
> *Possible solution*: in CuratorFrameworkImpl class adjust the processEvent()
> function and add the following:
> if(event.getType() == CuratorEventType.SYNC) {
> connectionStateManager.addStateChange(ConnectionState.RECONNECTED);
> }
> If this is a same state as before - it will be ignored, if background
> operation succeeded, but we are in SUSPENDED state - this would repair the
> Curator state and raise RECONNECTED event.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)