[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585814#comment-13585814
 ] 

Peter Nerg commented on ZOOKEEPER-1618:
---------------------------------------

Well my issue is the disconnect which I feel is unnecessary.
But as I've mentioned earlier I wanted to clear out whether it is intended to 
work this way (in such case fix documentation) or if it is possible to fix the 
client to deal with the leader taking a dive.
I'd obviously prefer that the ZK client deals with it as it makes my life 
simpler...:)

I'm not sure I follow your (Flavio) reasoning that the clients have to be 
disconnected as otherwise they may think that their sessions are ok.
As I've got it a disconnect event will not expire your sessions, there is a 
separate event "session-expired" that is used to mark that your session is 
invalid.
If an entire ensemble is shut down for hours and then restarted my sessions are 
still valid as there is no master/leader to expire them.
Taking down the leader will only cause a disconnect followed by an immediate 
connect and since there is no leader no sessions are expired.
So I'm not sure I follow the reasoning behind needing to send a disconnect so 
that clients don't think their sessions are ok.

I get that from a perspective of a server it may choose to disconnect itself 
from the ensemble due to network/disk issues but that is not really any 
different from killing that server.
So I still don't get why the clients need to know that one of the members is 
gone if the ensemble is still working.

As I gather there are basically three states a session can have.
- connected
- disconnected (ZK client will autoconnect, all session state is kept)
- session expired (requires the application to re-connect and re-establish)

                
> Disconnected event when stopping leader process
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-1618
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.4, 3.4.5
>         Environment: Linux SLES
> java version "1.6.0_31"
>            Reporter: Peter Nerg
>            Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to