[
https://issues.apache.org/jira/browse/CURATOR-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124622#comment-17124622
]
Rhys Yarranton commented on CURATOR-570:
----------------------------------------
Cam raises a good question. I just put a debugger on a local test program and
found helperConnectionString and ensembleProviderConnectionString had
superficially different values, one using host names and the other using IP
addresses. Looking further, disagreement can also happen due servers being in
a different order. (Which in our case appears to be possible, even likely.)
I also notice that the 4.2 and 4.3 code for getNewConnectionString are
different. Here is the 4.2 version:
{code:java}
String getNewConnectionString()
{
String helperConnectionString = (helper != null) ?
helper.getConnectionString() : null;
return ((helperConnectionString != null) &&
!ensembleProvider.getConnectionString().equals(helperConnectionString)) ?
helperConnectionString : null;
}
{code}
Our worst problems were under 4.2, so it's possible the difference is
significant.
> Excessive calls to ZooKeeper.updateServerList (which can result in session
> death)
> ---------------------------------------------------------------------------------
>
> Key: CURATOR-570
> URL: https://issues.apache.org/jira/browse/CURATOR-570
> Project: Apache Curator
> Issue Type: Bug
> Components: Framework
> Affects Versions: 4.2.0, 4.3.0
> Reporter: Rhys Yarranton
> Priority: Major
>
> On suspend and reconnect, Curator calls ZooKeeper.updateServerList via
> ConnectionState.checkState --> ConnectionState.handleNewConnectionString. In
> addition, recipes may be triggered by this as well, and they too make calls
> ZooKeeper.updateServerList via ConnectState.checkTimeouts -->
> ConnectionState.handleNewConnectionString.
> This happens even though the connection string has not actually changed.
> Due to ZOOKEEPER-3825, this can cause the connection to be closed
> immediately. On its own this would be perceived as a glitch. But due to the
> Curator-induced calls, what we see is a cycle of SUSPENDED/RECONNECTED, until
> eventually the session dies and a new session is recreated.
> Based on the source code (at time of writing), ZooKeeper.updateServerList is
> not intended to be called frequently like this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)