Running Fly created CURATOR-320:
-----------------------------------

             Summary: Discovery reregiser triggered even if retry policy 
suceeds.
                 Key: CURATOR-320
                 URL: https://issues.apache.org/jira/browse/CURATOR-320
             Project: Apache Curator
          Issue Type: Bug
          Components: Client, Framework
    Affects Versions: 2.10.0, TBD
         Environment: 3 server Quorum running on individual AWS boxes.
session timeout set to 1-2 min on most clients.
            Reporter: Running Fly
             Fix For: TBD


    ServiceDiscoveryImpl.reRegisterServices() can be trigger  on 
ConnectionState events: RECONNECTED and CONNECTED. Causing the 
reRegisterServices() method to be run on ConnectionStateManager thread. If a 
connection drops while running reRegisterServices() it will be recovered by the 
retry policy. However the ConnectionState SUSPENDED followed by RECONNECTED 
events will be queued but not fired until reRegisterServices() 
completes(ConnectionStateManager Thread fires these events but is in use). When 
it does complete the RECONNECTED event in the queue will fire and 
reRegisterServices() will rerun.
    When zookeeper's server connection is interrupted all of the clients will 
simultaneously call reRegisterServices(). This overloads the server with 
requests causing connections to timeout and reset. Thus queuing up more 
RECONNECTED events. This state can persist indefinitely.
    Because the reRegisterServices() will most likely receive a 
NodeExistsException. It deletes and recreates the node. Effectively causing the 
services to thrash up and down. Wreaking havoc on our service dependency chain. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to