[jira] [Commented] (ZOOKEEPER-1618) Disconnected event when stopping leader process

Flavio Junqueira (JIRA) Mon, 25 Feb 2013 03:10:16 -0800

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585803#comment-13585803
 ]


Flavio Junqueira commented on ZOOKEEPER-1618:
---------------------------------------------

Hi Peter,

If your issue is with documentation, then perhaps you should mark it as such. 
Currently it is marked as a bug. It is possible that we don't have it 
explicitly explained anywhere. The only reference I could find is the last 
question of the wiki FAQ. 

Let me also add a thought. Suppose that a server gets disconnected from the 
rest of the ensemble because it is slow, some network issue, etc. It 
transitions to the LOOKING state (not leading or following); the rest of the 
ensemble can make progress just fine. From the perspective of the disconnected 
server, this situation is indistinguishable from the leader failing, so if we 
don't disconnect the clients, they might think that their sessions are ok while 
the ensemble leader has actually expired them. 


                
> Disconnected event when stopping leader process
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-1618
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.4, 3.4.5
>         Environment: Linux SLES
> java version "1.6.0_31"
>            Reporter: Peter Nerg
>            Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1618) Disconnected event when stopping leader process

Reply via email to