[
https://issues.apache.org/jira/browse/CURATOR-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265455#comment-15265455
]
Jordan Zimmerman commented on CURATOR-320:
------------------------------------------
A pull request with a fix would be appreciated.
> Discovery reregister triggered even if retry policy suceeds. Connection
> looping condition.
> ------------------------------------------------------------------------------------------
>
> Key: CURATOR-320
> URL: https://issues.apache.org/jira/browse/CURATOR-320
> Project: Apache Curator
> Issue Type: Bug
> Components: Client, Framework
> Affects Versions: TBD, 2.10.0
> Environment: 3 server Quorum running on individual AWS boxes.
> Session timeout set to 1-2 min on most clients.
> Reporter: Running Fly
> Fix For: TBD
>
>
> ServiceDiscoveryImpl.reRegisterServices() can be trigger on
> ConnectionState events: RECONNECTED and CONNECTED. Causing the
> reRegisterServices() method to be run on ConnectionStateManager thread. If a
> connection drops while running reRegisterServices() it will be recovered by
> the retry policy. However the ConnectionState SUSPENDED followed by
> RECONNECTED events will be queued but not fired until reRegisterServices()
> completes(ConnectionStateManager Thread fires these events but is in use).
> When it does complete the RECONNECTED event in the queue will fire and
> reRegisterServices() will rerun.
> When zookeeper's server connection is interrupted all of the clients will
> simultaneously call reRegisterServices(). This overloads the server with
> requests causing connections to timeout and reset. Thus queuing up more
> RECONNECTED events. This state can persist indefinitely.
> Because the reRegisterServices() will most likely receive a
> NodeExistsException. It deletes and recreates the node. Effectively causing
> the services to thrash up and down. Wreaking havoc on our service dependency
> chain.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)