[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584403#comment-13584403
 ] 

Alexander Shraer commented on ZOOKEEPER-1618:
---------------------------------------------

Hi Peter, 

It may indeed be better to handle disconnects inside the client library, but 
I'm guessing that by exposing them the application gets more flexibility in the 
way it handles the events. 

What I wanted to comment on was actually the second part of your comment 
(processing errors). ZooKeeper itself will not loose correctness when it 
becomes unavailable. So the processing errors you get don't seem like 
ZooKeeper's fault. 

Also, by handling disconnects under the cover the service will not become more 
available - it may do queueing of requests for you but it will not execute 
those requests while the leader is being elected. You may similarly queue the 
requests in your application instead of loosing them. 

In general, it is theoretically impossible to always guarantee both correctness 
and availability of distributed agreement without making stronger synchrony 
assumptions than the ones made by ZooKeeper or by Paxos (FLP result). So its 
just a matter of whether we hide the unavailability or expose it.

Alex
                
> Disconnected event when stopping leader process
> -----------------------------------------------
>
>                 Key: ZOOKEEPER-1618
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1618
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.4, 3.4.5
>         Environment: Linux SLES
> java version "1.6.0_31"
>            Reporter: Peter Nerg
>            Priority: Minor
>
> Running a three node ZK cluster I stop/kill the leader node.
> Immediately all connected clients will receive a Disconnected event, a second 
> or so later an event with SyncConnected is received.
> Killing a follower will not produce the same issue/event.
> The application/clients have been implemented to manage Disconnected events 
> so they survive.
> I however expected the ZK client to manage the hickup during the election 
> process. 
> This produces quite a lot of logging in large clusters that have many 
> services relying on ZK.
> In some cases we may loose a few requests as we need a working ZK cluster to 
> execute those requests.
> IMHO it's not really full high availability if the ZK cluster momentarily 
> takes a dive due to that the leader goes away.
> No matter how much redundancy one uses in form of ZK instances one still may 
> get processing errors during leader election.
> I've verified this behavior in both 3.4.4 and 3.4.5

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to