[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116251#comment-13116251
 ] 

Rakesh R commented on ZOOKEEPER-1209:
-------------------------------------

Soon will upload patch with the below approach. Please let me know if we have 
any other better solution to apply.

*Approach:*
IMO better the LES f/w can handle these events ('Disconnected', 'SyncConnected' 
and 'Expiry' ZooKeeper events), rather than be silent. It will help the users 
be in a safe state instead of be in the same state (ELECTED/READY). 

Provide 'EventProcessor-Thread', one thread per LES. This service will execute 
the events with a time bounded delay. After choosing the first event, the 
processor will wait for the configured ‘eventDelayTimeout’ and again pick the 
latest event present in the queue (if exists). Finally the processor will 
execute the most recent event. This delay is given in order to avoid slight 
network fluctuations, wait for some grace period say ‘eventDelayTimeout’ 
default value could be ‘sessionTimeOut/2’.

All the watchevents (‘Disconnected’, ‘SyncConnected’, ‘Expiry’ events ) from 
the ZooKeeper server and will be given to this processor. It will have the 
following logic

+Disconnected logic:+
Introduce new state NEUTRAL to represent the disconnection and the clients will 
see the node has disconnected from the ZooKeeper can be in a safe mode.
1)If the LeaderElectionSupport state is not STOP, dispatch NEUTRAL event to the 
user. So the user application can act upon it. This will help to go to a safe 
state rather than in the ELECTED state.

+SyncConnected logic:+
1)Check if my ephemeral node ‘leaderOffer.getnodePath()’ is present in the 
ZooKeeper or not
2)If Yes, go to determineElectionStatus(). This will decide the state 
ELECTED/READY.
3)If No, makeOffer() and determineElectionStatus(). This will first create 
ephemeral node and go to leader determination phase.

+Expiry logic:+
The serving cluster or standalone ZooKeeper has expired this session. This 
implies, user must create a new client connection (instantiate a new ZooKeeper 
instance) if you with to access the ensemble.

1) On receival of Expiry, dispatch STOP event to the client. This will notifies 
the client and they can restart the LeaderElectionSupport with new ZooKeeper 
client session.

Thanks,
Rakesh
                
> LeaderElection recipe doesn't handle the split-brain issue, n/w disconnection 
> can bring both the client nodes to be in ELECTED
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1209
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1209
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: recipes
>    Affects Versions: 3.3.3
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>
> *Case1-* N/w disconnection can bring both the client nodes to be in ELECTED 
> state. Current LeaderElectionSupport(LES) f/w handles only 'NodeDeletion'.
>  
> Consider the scenario where ELECTED and READY nodes are running. Say ELECTED 
> node's n/w got failed and is "Disconnected" from ZooKeeper. But it will 
> behave as ELECTED as it is not getting any events from the 
> LeaderElectionSupport(LES) framework.
> After sessiontimeout, node in READY state will be notified by 'NodeDeleted' 
> event and will go to ELECTED state.
> *Problem:* 
> Both the node becomes ELECTED and finally the user sees two Master (ELECTED) 
> node and cause inconsistencies.
> *Case2-* Also in this case, Let's say if user has started only one client 
> node and becomes ELECTED. After sometime n/w has disconnected with the 
> ZooKeeper server and the session got expired. 
> *Problem:*
> Still the client node will be in the ELECTED state. After sometime if user 
> has started the second client node. Again both the nodes becomes ELECTED.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to