[
https://issues.apache.org/jira/browse/CURATOR-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126360#comment-17126360
]
Rhys Yarranton commented on CURATOR-570:
----------------------------------------
No unit test. What you describe sounds plausible for 4.3 (_i.e._, given the
fix for CURATOR-551). iiuc,
* At first FixedEnsembleProvider will have the connection string provided to
Curator at start-up. This value should also make its way into Helper.
* After connect or reconnect or the watch fires, EnsembleTracker will
determine a (possibly different) connection string according to the ZK servers
based on the special config node. This is a background operation. Once
determined, this value gets put into the FixedEnsembleProvider, but is not
immediately pushed into Helper.
* At a later time, say connection suspended, HandleHolder will check the
FixedEnsembleProvider value against the Helper value. If they differ, the
FixedEnsembleProvider value will be returned. (In 4.3, not 4.2.)
* Shortly after, but in a different method
(ConnectionState.handleNewConnectionString)), this new value will be put back
into Helper. (So there may be a window for multiple check-fails to amass
before it resolves itself, though it seems it should be small.)
Remark: ExhibitorEnsembleProvider is different from FixedEnsembleProvider for
our purposes. It ignores the value from EnsembleTracker.
Is the value from EnsembleTracker consistent, apart from genuine changes? The
servers are in a HashMap with a key of the server ID. There are things that
could make that inconsistent, like two keys in the same bucket inserted in
different orders. In practice this seems like it would be rare, maybe even
never. (Of course, ZooKeeper could change it to an inconsistent
implementation, not that there's any obvious reason to.)
You could guard against this if you wanted.
EnsembleTracker.configToConnectionString is Curator code. It could impose an
order on the servers, _e.g._, by interjecting a TreeMap.
So ... absent HashMap madness, and unless values are getting reset somewere I
haven't noticed ... in 4.3 there could be one glitch. Or possibly one glitch
detected more than once due to multiple threads. But after that the value
should ultimately have come from EnsembleTracker and things should clear up.
Caveat, I do not pretend to understand this code well.
> Excessive calls to ZooKeeper.updateServerList (which can result in session
> death)
> ---------------------------------------------------------------------------------
>
> Key: CURATOR-570
> URL: https://issues.apache.org/jira/browse/CURATOR-570
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 4.2.0, 4.3.0
> Reporter: Rhys Yarranton
> Priority: Major
>
> On suspend and reconnect, Curator calls ZooKeeper.updateServerList via
> ConnectionState.checkState --> ConnectionState.handleNewConnectionString. In
> addition, recipes may be triggered by this as well, and they too make calls
> ZooKeeper.updateServerList via ConnectState.checkTimeouts -->
> ConnectionState.handleNewConnectionString.
> This happens even though the connection string has not actually changed.
> Due to ZOOKEEPER-3825, this can cause the connection to be closed
> immediately. On its own this would be perceived as a glitch. But due to the
> Curator-induced calls, what we see is a cycle of SUSPENDED/RECONNECTED, until
> eventually the session dies and a new session is recreated.
> Based on the source code (at time of writing), ZooKeeper.updateServerList is
> not intended to be called frequently like this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)