[
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231405#comment-13231405
]
Zhihong Yu commented on HBASE-5549:
-----------------------------------
Minor comments:
For ActiveMasterManager.java, the second line should be indented to the left.
{code}
- String backupZNode = ZKUtil.joinZNode(
+ ClusterStatusTracker clusterStatusTracker) {
{code}
I tried to create a review request on review board but got error 500.
The patch is of decent size so review board usage is desirable. Please leave
out Bugs field so that this JIRA is not flooded.
Nice work.
> Master can fail if ZooKeeper session expires
> --------------------------------------------
>
> Key: HBASE-5549
> URL: https://issues.apache.org/jira/browse/HBASE-5549
> Project: HBase
> Issue Type: Bug
> Components: master, zookeeper
> Affects Versions: 0.96.0
> Environment: all
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch,
> 5549.v8.patch, 5549.v9.patch, nochange.patch
>
>
> There is a retry mechanism in RecoverableZooKeeper, but when the session
> expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism
> does not work in this case. This is why a sleep is needed in
> TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher
> to be recreated before using the connection.
> This can happen in real life, it can happen when:
> - master & zookeeper starts
> - zookeeper connection is cut
> - master enters the retry loop
> - in the meantime the session expires
> - the network comes back, the session is recreated
> - the retries continues, but on the wrong object, hence fails.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira