[
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586925#comment-13586925
]
Peter Nerg commented on ZOOKEEPER-1618:
---------------------------------------
{quote}The client doesn't need to know that a member of the ensemble is gone. A
client needs to know that it needs to find another server that is either
following or leading before its session expire, otherwise it might lose
ephemerals and such. The client learns it through the disconnected event and it
is not important to the client the precise reason.{quote}
So what you're saying is that killing a leader requires more action on behalf
of the client hence the client needs to be notified via a disconnected event.
I'm starting to feel slightly daft but I don't see the difference with the
scenario that you kill a follower. Any clients attached to the killed instance
will also have to migrate to a new alive ZK instance (leader or follower).
Though I guess your answer lies in:
{quote}I'm pointing out that a server cannot distinguish a situation in which
servers are partitioned away from each other for hours, and therefore there is
no leader, from one in which a single server is partitioned away and the rest
of the ensemble is making progress.{quote}
As I gather then the key point is that the client has no way to see the
difference between a cluster partition and a temporary loss of the leader.
Now we're getting somewhere, perhaps even my thick skull starts to get the
picture...:)
So if this is how it behaves due to the explanation above then I got the
answers I wanted.
Though I then expect this to be appropriately documented to avoid future
confusion.
Do you want me to create a new documentation bug or will you just re-use this
one?
> Disconnected event when stopping leader process
> -----------------------------------------------
>
> Key: ZOOKEEPER-1618
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.4, 3.4.5
> Environment: Linux SLES
> java version "1.6.0_31"
> Reporter: Peter Nerg
> Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events
> so they survive.
> I however expected the ZK client to manage the hickup during the election
> process.
> This produces quite a lot of logging in large clusters that have many
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira