[
https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919101#action_12919101
]
HBase Review Board commented on HBASE-2700:
-------------------------------------------
Message from: "Jonathan Gray" <[email protected]>
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/995/
-----------------------------------------------------------
Review request for hbase and stack.
Summary
-------
First go at a unit test of master failover with regions in transition.
Comment from the test method:
/**
* Complex test of master failover that tests as many permutations of the
* different possible states that regions in transition could be in within ZK.
* <p>
* This tests the proper handling of these states by the failed-over master
* and includes a thorough testing of the timeout code as well.
* <p>
* Starts with a single master and three regionservers.
* <p>
* Creates two tables, enabledTable and disabledTable, each containing 5
* regions. The disabledTable is then disabled.
* <p>
* After reaching steady-state, the master is killed. We then mock several
* states in ZK.
* <p>
* After mocking them, we will startup a new master which should become the
* active master and also detect that it is a failover. The primary test
* passing condition will be that all regions of the enabled table are
* assigned and all the regions of the disabled table are not assigned.
* <p>
* The different scenarios to be tested are below:
* <p>
* <b>ZK State: OFFLINE</b>
* <p>A node can get into OFFLINE state if</p>
* <ul>
* <li>An RS fails to open a region, so it reverts the state back to OFFLINE
* <li>The Master is assigning the region to a RS before it sends RPC
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>Master has assigned an enabled region but RS failed so a region is
* not assigned anywhere and is sitting in ZK as OFFLINE</li>
* <li>This seems to cover both cases?</li>
* </ul>
* <p>
* <b>ZK State: CLOSING</b>
* <p>A node can get into CLOSING state if</p>
* <ul>
* <li>An RS has begun to close a region
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>Region was being closed but the RS died before finishing the close
* <li>Region of enabled table was being closed but did not complete
* <li>Region of disabled table was being closed but did not complete
* </ul>
* <p>
* <b>ZK State: CLOSED</b>
* <p>A node can get into CLOSED state if</p>
* <ul>
* <li>An RS has completed closing a region but not acknowledged by master yet
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>Region of a table that should be enabled was closed on an RS
* <li>Region of a table that should be disabled was closed on an RS
* </ul>
* <p>
* <b>ZK State: OPENING</b>
* <p>A node can get into OPENING state if</p>
* <ul>
* <li>An RS has begun to open a region
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>RS was opening a region of enabled table but never finishes
* </ul>
* <p>
* <b>ZK State: OPENED</b>
* <p>A node can get into OPENED state if</p>
* <ul>
* <li>An RS has finished opening a region but not acknowledged by master yet
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>Region of a table that should be enabled was opened on an RS
* <li>Region of a table that should be disabled was opened on an RS
* <li>Region of a table that should be enabled was opened by a now-dead RS
* <li>Region of a table that should be disabled was opened by a now-dead RS
* </ul>
* <p>
* <b>ZK State: NONE</b>
* <p>A region could not have a transition node if</p>
* <ul>
* <li>The server hosting the region died and no master processed it
* </ul>
* <p>We will mock the scenarios</p>
* <ul>
* <li>Region of enabled table was on a dead RS that was not yet processed
* <li>Region of disabled table was on a dead RS that was not yet processed
* </ul>
* @throws Exception
*/
This addresses bug HBASE-2700.
http://issues.apache.org/jira/browse/HBASE-2700
Diffs
-----
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1005264
trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1005264
trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java 1005264
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1005264
trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 1005264
trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 1005264
trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
1005264
Diff: http://review.cloudera.org/r/995/diff
Testing
-------
running the unit test!
Thanks,
Jonathan
> Handle master failover for regions in transition
> ------------------------------------------------
>
> Key: HBASE-2700
> URL: https://issues.apache.org/jira/browse/HBASE-2700
> Project: HBase
> Issue Type: Sub-task
> Components: master, zookeeper
> Reporter: Jonathan Gray
> Assignee: Jonathan Gray
> Priority: Critical
> Fix For: 0.90.0
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in
> transition into ZK, but we have not fully handled the master failover case.
> This is to deal with that and to write tests for it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.