[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585822#comment-13585822
 ] 

Flavio Junqueira commented on ZOOKEEPER-1618:
---------------------------------------------

bq. As I've got it a disconnect event will not expire your sessions, there is a 
separate event "session-expired" that is used to mark that your session is 
invalid.

Correct.

bq. If an entire ensemble is shut down for hours and then restarted my sessions 
are still valid as there is no master/leader to expire them.

That's correct too, but I'm pointing out that a server cannot distinguish a 
situation in which servers are partitioned away from each other for hours, and 
therefore there is no leader, from one in which a single server is partitioned 
away and the rest of the ensemble is making progress.

bq. Taking down the leader will only cause a disconnect followed by an 
immediate connect and since there is no leader no sessions are expired.

Correct.


bq. So I'm not sure I follow the reasoning behind needing to send a disconnect 
so that clients don't think their sessions are ok.

Disconnecting the client will make it look for another server.

bq. I get that from a perspective of a server it may choose to disconnect 
itself from the ensemble due to network/disk issues but that is not really any 
different from killing that server.
So I still don't get why the clients need to know that one of the members is 
gone if the ensemble is still working.

The client doesn't need to know that a member of the ensemble is gone. A client 
needs to know that it needs to find another server that is either following or 
leading before its session expire, otherwise it might lose ephemerals and such. 
The client learns it through the disconnected event and it is not important to 
the client the precise reason.

bq. As I gather there are basically three states a session can have.

You have possibly seen this already, but here is a diagram showing the possible 
states and transitions:

http://zookeeper.apache.org/doc/r3.4.5/zookeeperProgrammers.html#ch_zkSessions
                
> Disconnected event when stopping leader process
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-1618
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.4, 3.4.5
>         Environment: Linux SLES
> java version "1.6.0_31"
>            Reporter: Peter Nerg
>            Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to