[
https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239631#comment-13239631
]
Todd Lipcon commented on HADOOP-8212:
-------------------------------------
bq. I think we want to added similar handling in the StatCallback. Its another
race waiting to happen.
The patch does add the same handling to StatCallback. It uses the ZooKeeper
"context" parameter to pass the original zkClient. Unfortunately the Watcher
interface doesn't have any context object, which is why I had to introduce the
wrapper class there.
bq. The comment on processWatchEvent needs to change slightly to reflect that
its the proxied watcher callback handler.
Does the following look good?
{code}
- * interface implementation of Zookeeper watch events (connection and node)
+ * interface implementation of Zookeeper watch events (connection and node),
+ * proxied by {@link WatcherWithClientRef}.
{code}
bq. Whats the hurry?
In my experience working on similar projects in the past, getting all the
initial code in place is only half the battle. The real work starts once the
code is there and you start banging on it in realistic test scenarios. We'd
like to see automatic failover be a supported piece of the HA solution in
0.23.x (..err..2.0), and to hit that timeline, we need to get into the latter
phase ASAP.
I'm less aggressive when it comes to changing existing code, but since this is
all new code, there's no risk of regressing working features by moving fast
here. Once it starts to stabilize we can afford to slow down the rate of
change. If you'd prefer, I'm happy to create a feature branch for auto-failover
and then call a merge vote when it's ready for the full QA onslaught.
> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>
> Key: HADOOP-8212
> URL: https://issues.apache.org/jira/browse/HADOOP-8212
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 0.23.3, 0.24.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.23.3, 0.24.0
>
> Attachments: hadoop-8212.txt, hadoop-8212.txt
>
>
> Currently when the ZK session expires, it results in a fatal error being sent
> to the application callback. This is not the best behavior -- for example, in
> the case of HA, if ZK goes down, we would like the current state to be
> maintained, rather than causing either NN to abort. When the ZK clients are
> able to reconnect, they should sort out the correct leader based on the
> normal locking schemes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira