[jira] [Commented] (CURATOR-570) Excessive calls to ZooKeeper.updateServerList (which can result in session death)

Rhys Yarranton (Jira) Thu, 04 Jun 2020 21:02:26 -0700


    [ 
https://issues.apache.org/jira/browse/CURATOR-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126360#comment-17126360
 ]


Rhys Yarranton commented on CURATOR-570:
----------------------------------------

No unit test.  What you describe sounds plausible for 4.3 (_i.e._, given the 
fix for CURATOR-551).  iiuc,
 * At first FixedEnsembleProvider will have the connection string provided to 
Curator at start-up.  This value should also make its way into Helper.
 * After connect or reconnect or the watch fires, EnsembleTracker will 
determine a (possibly different) connection string according to the ZK servers 
based on the special config node.  This is a background operation.  Once 
determined, this value gets put into the FixedEnsembleProvider, but is not 
immediately pushed into Helper.
 * At a later time, say connection suspended, HandleHolder will check the 
FixedEnsembleProvider value against the Helper value.  If they differ, the 
FixedEnsembleProvider value will be returned.  (In 4.3, not 4.2.)
 * Shortly after, but in a different method 
(ConnectionState.handleNewConnectionString)), this new value will be put back 
into Helper.  (So there may be a window for multiple check-fails to amass 
before it resolves itself, though it seems it should be small.)

Remark: ExhibitorEnsembleProvider is different from FixedEnsembleProvider for 
our purposes.  It ignores the value from EnsembleTracker.

Is the value from EnsembleTracker consistent, apart from genuine changes?  The 
servers are in a HashMap with a key of the server ID.  There are things that 
could make that inconsistent, like two keys in the same bucket inserted in 
different orders.  In practice this seems like it would be rare, maybe even 
never.  (Of course, ZooKeeper could change it to an inconsistent 
implementation, not that there's any obvious reason to.)

You could guard against this if you wanted.  
EnsembleTracker.configToConnectionString is Curator code.  It could impose an 
order on the servers, _e.g._, by interjecting a TreeMap.

So ... absent HashMap madness, and unless values are getting reset somewere I 
haven't noticed ... in 4.3 there could be one glitch.  Or possibly one glitch 
detected more than once due to multiple threads.  But after that the value 
should ultimately have come from EnsembleTracker and things should clear up.

Caveat, I do not pretend to understand this code well.

> Excessive calls to ZooKeeper.updateServerList (which can result in session 
> death)
> ---------------------------------------------------------------------------------
>
>                 Key: CURATOR-570
>                 URL: https://issues.apache.org/jira/browse/CURATOR-570
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.2.0, 4.3.0
>            Reporter: Rhys Yarranton
>            Priority: Major
>
> On suspend and reconnect, Curator calls ZooKeeper.updateServerList via 
> ConnectionState.checkState --> ConnectionState.handleNewConnectionString.  In 
> addition, recipes may be triggered by this as well, and they too make calls 
> ZooKeeper.updateServerList via ConnectState.checkTimeouts --> 
> ConnectionState.handleNewConnectionString.
> This happens even though the connection string has not actually changed.
> Due to ZOOKEEPER-3825, this can cause the connection to be closed 
> immediately.  On its own this would be perceived as a glitch.  But due to the 
> Curator-induced calls, what we see is a cycle of SUSPENDED/RECONNECTED, until 
> eventually the session dies and a new session is recreated.
> Based on the source code (at time of writing), ZooKeeper.updateServerList is 
> not intended to be called frequently like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CURATOR-570) Excessive calls to ZooKeeper.updateServerList (which can result in session death)

Reply via email to