[ 
https://issues.apache.org/jira/browse/CURATOR-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265455#comment-15265455
 ] 

Jordan Zimmerman commented on CURATOR-320:
------------------------------------------

A pull request with a fix would be appreciated.

> Discovery reregister triggered even if retry policy suceeds. Connection 
> looping condition.
> ------------------------------------------------------------------------------------------
>
>                 Key: CURATOR-320
>                 URL: https://issues.apache.org/jira/browse/CURATOR-320
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client, Framework
>    Affects Versions: TBD, 2.10.0
>         Environment: 3 server Quorum running on individual AWS boxes.
> Session timeout set to 1-2 min on most clients.
>            Reporter: Running Fly
>             Fix For: TBD
>
>
>     ServiceDiscoveryImpl.reRegisterServices() can be trigger  on 
> ConnectionState events: RECONNECTED and CONNECTED. Causing the 
> reRegisterServices() method to be run on ConnectionStateManager thread. If a 
> connection drops while running reRegisterServices() it will be recovered by 
> the retry policy. However the ConnectionState SUSPENDED followed by 
> RECONNECTED events will be queued but not fired until reRegisterServices() 
> completes(ConnectionStateManager Thread fires these events but is in use). 
> When it does complete the RECONNECTED event in the queue will fire and 
> reRegisterServices() will rerun.
>     When zookeeper's server connection is interrupted all of the clients will 
> simultaneously call reRegisterServices(). This overloads the server with 
> requests causing connections to timeout and reset. Thus queuing up more 
> RECONNECTED events. This state can persist indefinitely.
>     Because the reRegisterServices() will most likely receive a 
> NodeExistsException. It deletes and recreates the node. Effectively causing 
> the services to thrash up and down. Wreaking havoc on our service dependency 
> chain. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to