[
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585814#comment-13585814
]
Peter Nerg commented on ZOOKEEPER-1618:
---------------------------------------
Well my issue is the disconnect which I feel is unnecessary.
But as I've mentioned earlier I wanted to clear out whether it is intended to
work this way (in such case fix documentation) or if it is possible to fix the
client to deal with the leader taking a dive.
I'd obviously prefer that the ZK client deals with it as it makes my life
simpler...:)
I'm not sure I follow your (Flavio) reasoning that the clients have to be
disconnected as otherwise they may think that their sessions are ok.
As I've got it a disconnect event will not expire your sessions, there is a
separate event "session-expired" that is used to mark that your session is
invalid.
If an entire ensemble is shut down for hours and then restarted my sessions are
still valid as there is no master/leader to expire them.
Taking down the leader will only cause a disconnect followed by an immediate
connect and since there is no leader no sessions are expired.
So I'm not sure I follow the reasoning behind needing to send a disconnect so
that clients don't think their sessions are ok.
I get that from a perspective of a server it may choose to disconnect itself
from the ensemble due to network/disk issues but that is not really any
different from killing that server.
So I still don't get why the clients need to know that one of the members is
gone if the ensemble is still working.
As I gather there are basically three states a session can have.
- connected
- disconnected (ZK client will autoconnect, all session state is kept)
- session expired (requires the application to re-connect and re-establish)
> Disconnected event when stopping leader process
> -----------------------------------------------
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
> Reporter: Peter Nerg
> Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events
> so they survive.
> I however expected the ZK client to manage the hickup during the election
> process.
> This produces quite a lot of logging in large clusters that have many
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira