[ 
https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-8212:
--------------------------------

    Attachment: hadoop-8212-delta-bikas.txt

Here's a delta patch:
- adds the sessionExpired check to the stat callback
- adds a new functional test to try to trigger this scenario -- expires the 
standby's session while it is monitoring the node.

In practice, running these tests, I found that the Watcher always triggers with 
SessionExpired before the individual callbacks do. I don't know if this is a 
guarantee of ZK, but seems to be the current behavior. So we are doubly 
protected: the {{Expired}} handling case in the Watcher calls 
{{rejoinElection()}} which makes a new {{zkClient}}. So the new check against 
{{zkClient}} consistency will short circuit the events before even hitting the 
new session expiration tests.

I also intend to separately write a stress test which should hopefully catch 
races like this, should ZK's behavior change.
                
> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>
>                 Key: HADOOP-8212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8212
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 2.0.0
>
>         Attachments: hadoop-8212-delta-bikas.txt, hadoop-8212.txt, 
> hadoop-8212.txt
>
>
> Currently when the ZK session expires, it results in a fatal error being sent 
> to the application callback. This is not the best behavior -- for example, in 
> the case of HA, if ZK goes down, we would like the current state to be 
> maintained, rather than causing either NN to abort. When the ZK clients are 
> able to reconnect, they should sort out the correct leader based on the 
> normal locking schemes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to