You shouldn't need to recreate the Curator instance on LOST. This just
means that the client hasn't been able to connect to ZK for longer than the
session timeout. It will periodically try to reestablish a connection to
ZK. When the connection is successfully reestablished you will see a
RECONNECTED event.

If the ZK servers are back up and you're not seeing a RECONNECTED event
then it would certainly point to a bug in Curator. If you can reproduce the
problem reliably it would make it much easier to track down.
cheers

On Wed, Oct 25, 2017 at 7:46 PM, Alex Rankin (JIRA) <[email protected]> wrote:

>
>     [ https://issues.apache.org/jira/browse/CURATOR-439?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16218259#comment-16218259 ]
>
> Alex Rankin commented on CURATOR-439:
> -------------------------------------
>
> From analysing the log files, it looks like the ConnectionState fluctuated
> between SUSPENDED and RECONNECTED a few times, and was LOST twice. The
> first time the connection was LOST, it RECONNECTED again afterwards. After
> the second time, there were no more ConnectionState changes.
>
> It isn't clear from the documentation, but are we expected to close and
> restart the Curator instance if the ConnectionState is LOST? After looking
> through some other public codebases, it seems that this is the approach
> that others take.
>
> > CuratorFrameworkState STARTED, but ZookeeperClient not connected
> > ----------------------------------------------------------------
> >
> >                 Key: CURATOR-439
> >                 URL: https://issues.apache.org/jira/browse/CURATOR-439
> >             Project: Apache Curator
> >          Issue Type: Bug
> >          Components: Framework
> >    Affects Versions: 3.2.1
> >            Reporter: Alex Rankin
> >            Priority: Minor
> >
> > I recently ran into an issue on some of our nodes caused by network
> issues between a service and Zookeeper. I have been unable to recreate them
> as of yet, but I'm still trying.
> > *+Setup+*
> > 5x services using Curator 3.2.1 to talk to Zookeeper 3.5.3 cluster (also
> 5 nodes).
> > Network issues caused the services to disconnect from Zookeeper.
> > There's a check in our code to see if the Zookeeper connection is
> available before sending a request:
> > {quote}public boolean isConnected() \{
> >     return curatorFramework.getZookeeperClient().isConnected();
> > \}
> > {quote}
> > After the network issues resolved, we noticed that all calls to
> Zookeeper from 4 of the services were still failing (the fifth was fine).
> Checking the logs, we saw that {{CuratorFramework.getState()}} was
> reporting the state as STARTED, but {{curatorFramework.
> getZookeeperClient().isConnected();}} was returning false. Restarting the
> service fixed everything, but I want to obviously avoid this issue in
> future.
> > *+Problem+*
> > I couldn't find any documentation stating whether the
> {{CuratorZookeeperClient.isConnected()}} should be used, or if
> {{CuratorFramework.getState() == CuratorFrameworkState.STARTED}} (the
> functionality of the deprecated {{CuratorFramework.isConnected()}}) would
> be the better check, or if these should both be equivalent, and there's a
> bug that let one be true while the other was false.
> > If my own check is wrong, and I shouldn't be using
> {{CuratorZookeeperClient.isConnected()}}, then I can easily fix that. I
> wanted to check the expected behaviour before diving too deep into this, in
> case this is normal and I am just using Curator incorrectly.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>

Reply via email to