[
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228487#comment-13228487
]
stack commented on HBASE-5572:
------------------------------
Above sounds good. Would suggest we retain the test that verifies that we
expire znode if same host and port. That behavior can be useful in the
single-master case.
> KeeperException.SessionExpiredException management could be improved in Master
> ------------------------------------------------------------------------------
>
> Key: HBASE-5572
> URL: https://issues.apache.org/jira/browse/HBASE-5572
> Project: HBase
> Issue Type: Improvement
> Components: master
> Affects Versions: 0.96.0
> Reporter: nkeywal
> Assignee: nkeywal
> Priority: Minor
> Attachments: 5572.v1.patch
>
>
> Synthesis:
> 1) TestMasterZKSessionRecovery distinguish two cases on
> SessionExpiredException. One is explicitly not managed. However, is seems
> that there is no reason for this.
> 2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a
> quite complex function, with a useless recursive call.
> 3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is
> equivalent to TestZooKeeper#testMasterSessionExpired
> 4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be
> removed if we merge the two cases mentioned above.
> Changes are:
> 2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a
> single case and remove recursion
> 1) Removing TestMasterZKSessionRecovery
> Detailed justification:
> testMasterZKSessionRecoveryFailure says:
> {noformat}
> /**
> * Negative test of master recovery from zk session expiry.
> *
> * Starts with one master. Fakes the master zk session expired.
> * Ensures the master cannot recover the expired zk session since
> * the master zk node is still there.
> */
> public void testMasterZKSessionRecoveryFailure() throws Exception {
> MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
> HMaster m = cluster.getMaster();
> m.abort("Test recovery from zk session expired",
> new KeeperException.SessionExpiredException());
> assertTrue(m.isStopped());
> }
> {noformat}
> This tests works, i.e. the assertion is always verified.
> But do we really want this behavior?
> When looking at the code, we see that this what's happening is strange:
> - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false
> HMaster#abort stops the master.
> - HMaster#abortNow checks the exception type. As it's a
> SessionExpiredException it will try to recover, calling
> HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false
> (and that will make HMaster#abort stopping the master)
> - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and
> then try to become the active master. If it cannot, it will return false (and
> that will make HMaster#abort stopping the master).
> - HMaster#becomeActiveMaster returns the result of
> ActiveMasterManager#blockUntilBecomingActiveMaster.
> blockUntilBecomingActiveMaster says it will return false if there is any
> error preventing it to become the active master.
> - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master
> address. If it's the same port & host, it deletes the nodes, that will start
> a recursive call to blockUntilBecomingActiveMaster. This second call succeeds
> (we became the active master) and return true. This result is ignored by the
> first blockUntilBecomingActiveMaster: it return false (even if we actually
> became the active master), hence the whole suite call returns false and
> HMaster#abort stops the master.
> In other words, the comment says "Ensures the master cannot recover the
> expired zk session since the master zk node is still there." but we're
> actually doing a check just for this and deleting the node. If we were not
> ignoring the result, we would return "true", so we would not stop the master,
> so the test would fail.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira