Running Fly created CURATOR-320:
-----------------------------------
Summary: Discovery reregiser triggered even if retry policy
suceeds.
Key: CURATOR-320
URL: https://issues.apache.org/jira/browse/CURATOR-320
Project: Apache Curator
Issue Type: Bug
Components: Client, Framework
Affects Versions: 2.10.0, TBD
Environment: 3 server Quorum running on individual AWS boxes.
session timeout set to 1-2 min on most clients.
Reporter: Running Fly
Fix For: TBD
ServiceDiscoveryImpl.reRegisterServices() can be trigger on
ConnectionState events: RECONNECTED and CONNECTED. Causing the
reRegisterServices() method to be run on ConnectionStateManager thread. If a
connection drops while running reRegisterServices() it will be recovered by the
retry policy. However the ConnectionState SUSPENDED followed by RECONNECTED
events will be queued but not fired until reRegisterServices()
completes(ConnectionStateManager Thread fires these events but is in use). When
it does complete the RECONNECTED event in the queue will fire and
reRegisterServices() will rerun.
When zookeeper's server connection is interrupted all of the clients will
simultaneously call reRegisterServices(). This overloads the server with
requests causing connections to timeout and reset. Thus queuing up more
RECONNECTED events. This state can persist indefinitely.
Because the reRegisterServices() will most likely receive a
NodeExistsException. It deletes and recreates the node. Effectively causing the
services to thrash up and down. Wreaking havoc on our service dependency chain.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)