[
https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-8212:
--------------------------------
Attachment: hadoop-8212-delta-bikas.txt
Here's a delta patch:
- adds the sessionExpired check to the stat callback
- adds a new functional test to try to trigger this scenario -- expires the
standby's session while it is monitoring the node.
In practice, running these tests, I found that the Watcher always triggers with
SessionExpired before the individual callbacks do. I don't know if this is a
guarantee of ZK, but seems to be the current behavior. So we are doubly
protected: the {{Expired}} handling case in the Watcher calls
{{rejoinElection()}} which makes a new {{zkClient}}. So the new check against
{{zkClient}} consistency will short circuit the events before even hitting the
new session expiration tests.
I also intend to separately write a stress test which should hopefully catch
races like this, should ZK's behavior change.
> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>
> Key: HADOOP-8212
> URL: https://issues.apache.org/jira/browse/HADOOP-8212
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 0.23.3, 0.24.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 2.0.0
>
> Attachments: hadoop-8212-delta-bikas.txt, hadoop-8212.txt,
> hadoop-8212.txt
>
>
> Currently when the ZK session expires, it results in a fatal error being sent
> to the application callback. This is not the best behavior -- for example, in
> the case of HA, if ZK goes down, we would like the current state to be
> maintained, rather than causing either NN to abort. When the ZK clients are
> able to reconnect, they should sort out the correct leader based on the
> normal locking schemes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira