> On 2010-10-08 13:43:59, stack wrote: > > trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, > > line 267 > > <http://review.cloudera.org/r/995/diff/1/?file=14445#file14445line267> > > > > When would this happen?
* <b>ZK State: OFFLINE</b> * <p>A node can get into OFFLINE state if</p> * <ul> * <li>An RS fails to open a region, so it reverts the state back to OFFLINE * <li>The Master is assigning the region to a RS before it sends RPC * </ul> * <p>We will mock the scenarios</p> * <ul> * <li>Master has assigned an enabled region but RS failed so a region is * not assigned anywhere and is sitting in ZK as OFFLINE</li> * <li>This seems to cover both cases?</li> * </ul> > On 2010-10-08 13:43:59, stack wrote: > > trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java, line > > 675 > > <http://review.cloudera.org/r/995/diff/1/?file=14448#file14448line675> > > > > Don't we have this in AssignmentManager already? > > isRegionsInTransition I believe its called. > > > > There is white space added at end of the two @throws lines. This tests ZK not the RIT map on the master. So for unit tests, you're testing two different things. Since i'm mocking data up in ZK, i wanted to ensure nothing left in zk. > On 2010-10-08 13:43:59, stack wrote: > > trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java, > > line 462 > > <http://review.cloudera.org/r/995/diff/1/?file=14451#file14451line462> > > > > What about the case where not all regions have been assigned -- say the > > master was killed mid-startup before all regions mentioned in .META. had > > been assigned by master? There should be a fixup where we compare the > > difference? Can we we even handle this case? We'd need to ask RSs what > > they are holding? IMO we don't need to support this (for now). I think it is acceptable that nothing can fail during a startup. If the master dies or an RS dies during initial startup, you have to restart. I think RS deaths may even work fine but I think it's okay to have a SPOF during startup. - Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/995/#review1496 ----------------------------------------------------------- On 2010-10-07 16:34:04, Jonathan Gray wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/995/ > ----------------------------------------------------------- > > (Updated 2010-10-07 16:34:04) > > > Review request for hbase and stack. > > > Summary > ------- > > First go at a unit test of master failover with regions in transition. > > Comment from the test method: > > /** > * Complex test of master failover that tests as many permutations of the > * different possible states that regions in transition could be in within > ZK. > * <p> > * This tests the proper handling of these states by the failed-over master > * and includes a thorough testing of the timeout code as well. > * <p> > * Starts with a single master and three regionservers. > * <p> > * Creates two tables, enabledTable and disabledTable, each containing 5 > * regions. The disabledTable is then disabled. > * <p> > * After reaching steady-state, the master is killed. We then mock several > * states in ZK. > * <p> > * After mocking them, we will startup a new master which should become the > * active master and also detect that it is a failover. The primary test > * passing condition will be that all regions of the enabled table are > * assigned and all the regions of the disabled table are not assigned. > * <p> > * The different scenarios to be tested are below: > * <p> > * <b>ZK State: OFFLINE</b> > * <p>A node can get into OFFLINE state if</p> > * <ul> > * <li>An RS fails to open a region, so it reverts the state back to OFFLINE > * <li>The Master is assigning the region to a RS before it sends RPC > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>Master has assigned an enabled region but RS failed so a region is > * not assigned anywhere and is sitting in ZK as OFFLINE</li> > * <li>This seems to cover both cases?</li> > * </ul> > * <p> > * <b>ZK State: CLOSING</b> > * <p>A node can get into CLOSING state if</p> > * <ul> > * <li>An RS has begun to close a region > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>Region was being closed but the RS died before finishing the close > * <li>Region of enabled table was being closed but did not complete > * <li>Region of disabled table was being closed but did not complete > * </ul> > * <p> > * <b>ZK State: CLOSED</b> > * <p>A node can get into CLOSED state if</p> > * <ul> > * <li>An RS has completed closing a region but not acknowledged by master > yet > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>Region of a table that should be enabled was closed on an RS > * <li>Region of a table that should be disabled was closed on an RS > * </ul> > * <p> > * <b>ZK State: OPENING</b> > * <p>A node can get into OPENING state if</p> > * <ul> > * <li>An RS has begun to open a region > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>RS was opening a region of enabled table but never finishes > * </ul> > * <p> > * <b>ZK State: OPENED</b> > * <p>A node can get into OPENED state if</p> > * <ul> > * <li>An RS has finished opening a region but not acknowledged by master > yet > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>Region of a table that should be enabled was opened on an RS > * <li>Region of a table that should be disabled was opened on an RS > * <li>Region of a table that should be enabled was opened by a now-dead RS > * <li>Region of a table that should be disabled was opened by a now-dead RS > * </ul> > * <p> > * <b>ZK State: NONE</b> > * <p>A region could not have a transition node if</p> > * <ul> > * <li>The server hosting the region died and no master processed it > * </ul> > * <p>We will mock the scenarios</p> > * <ul> > * <li>Region of enabled table was on a dead RS that was not yet processed > * <li>Region of disabled table was on a dead RS that was not yet processed > * </ul> > * @throws Exception > */ > > > This addresses bug HBASE-2700. > http://issues.apache.org/jira/browse/HBASE-2700 > > > Diffs > ----- > > trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java > 1005264 > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1005264 > trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java > 1005264 > trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1005264 > trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java > 1005264 > trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 1005264 > trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java > 1005264 > > Diff: http://review.cloudera.org/r/995/diff > > > Testing > ------- > > running the unit test! > > > Thanks, > > Jonathan > >
