> On 2010-10-08 13:43:59, stack wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, 
> > line 267
> > <http://review.cloudera.org/r/995/diff/1/?file=14445#file14445line267>
> >
> >     When would this happen?

   * <b>ZK State:  OFFLINE</b>
   * <p>A node can get into OFFLINE state if</p>
   * <ul>
   * <li>An RS fails to open a region, so it reverts the state back to OFFLINE
   * <li>The Master is assigning the region to a RS before it sends RPC
   * </ul>
   * <p>We will mock the scenarios</p>
   * <ul>
   * <li>Master has assigned an enabled region but RS failed so a region is
   *     not assigned anywhere and is sitting in ZK as OFFLINE</li>
   * <li>This seems to cover both cases?</li>
   * </ul>


> On 2010-10-08 13:43:59, stack wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java, line 
> > 675
> > <http://review.cloudera.org/r/995/diff/1/?file=14448#file14448line675>
> >
> >     Don't we have this in AssignmentManager already?
> >     isRegionsInTransition I believe its called.
> >     
> >     There is white space added at end of the two @throws lines.

This tests ZK not the RIT map on the master.  So for unit tests, you're testing 
two different things.  Since i'm mocking data up in ZK, i wanted to ensure 
nothing left in zk.


> On 2010-10-08 13:43:59, stack wrote:
> > trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java, 
> > line 462
> > <http://review.cloudera.org/r/995/diff/1/?file=14451#file14451line462>
> >
> >     What about the case where not all regions have been assigned -- say the 
> > master was killed mid-startup before all regions mentioned in .META. had 
> > been assigned by master?  There should be a fixup where we compare the 
> > difference?  Can we we even handle this case?  We'd need to ask RSs what 
> > they are holding?

IMO we don't need to support this (for now).  I think it is acceptable that 
nothing can fail during a startup.  If the master dies or an RS dies during 
initial startup, you have to restart.  I think RS deaths may even work fine but 
I think it's okay to have a SPOF during startup.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/995/#review1496
-----------------------------------------------------------


On 2010-10-07 16:34:04, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/995/
> -----------------------------------------------------------
> 
> (Updated 2010-10-07 16:34:04)
> 
> 
> Review request for hbase and stack.
> 
> 
> Summary
> -------
> 
> First go at a unit test of master failover with regions in transition.
> 
> Comment from the test method:
> 
>   /**
>    * Complex test of master failover that tests as many permutations of the
>    * different possible states that regions in transition could be in within 
> ZK.
>    * <p>
>    * This tests the proper handling of these states by the failed-over master
>    * and includes a thorough testing of the timeout code as well.
>    * <p>
>    * Starts with a single master and three regionservers.
>    * <p>
>    * Creates two tables, enabledTable and disabledTable, each containing 5
>    * regions.  The disabledTable is then disabled.
>    * <p>
>    * After reaching steady-state, the master is killed.  We then mock several
>    * states in ZK.
>    * <p>
>    * After mocking them, we will startup a new master which should become the
>    * active master and also detect that it is a failover.  The primary test
>    * passing condition will be that all regions of the enabled table are
>    * assigned and all the regions of the disabled table are not assigned.
>    * <p>
>    * The different scenarios to be tested are below:
>    * <p>
>    * <b>ZK State:  OFFLINE</b>
>    * <p>A node can get into OFFLINE state if</p>
>    * <ul>
>    * <li>An RS fails to open a region, so it reverts the state back to OFFLINE
>    * <li>The Master is assigning the region to a RS before it sends RPC
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>Master has assigned an enabled region but RS failed so a region is
>    *     not assigned anywhere and is sitting in ZK as OFFLINE</li>
>    * <li>This seems to cover both cases?</li>
>    * </ul>
>    * <p>
>    * <b>ZK State:  CLOSING</b>
>    * <p>A node can get into CLOSING state if</p>
>    * <ul>
>    * <li>An RS has begun to close a region
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>Region was being closed but the RS died before finishing the close
>    * <li>Region of enabled table was being closed but did not complete
>    * <li>Region of disabled table was being closed but did not complete
>    * </ul>
>    * <p>
>    * <b>ZK State:  CLOSED</b>
>    * <p>A node can get into CLOSED state if</p>
>    * <ul>
>    * <li>An RS has completed closing a region but not acknowledged by master 
> yet
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>Region of a table that should be enabled was closed on an RS
>    * <li>Region of a table that should be disabled was closed on an RS
>    * </ul>
>    * <p>
>    * <b>ZK State:  OPENING</b>
>    * <p>A node can get into OPENING state if</p>
>    * <ul>
>    * <li>An RS has begun to open a region
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>RS was opening a region of enabled table but never finishes
>    * </ul>
>    * <p>
>    * <b>ZK State:  OPENED</b>
>    * <p>A node can get into OPENED state if</p>
>    * <ul>
>    * <li>An RS has finished opening a region but not acknowledged by master 
> yet
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>Region of a table that should be enabled was opened on an RS
>    * <li>Region of a table that should be disabled was opened on an RS
>    * <li>Region of a table that should be enabled was opened by a now-dead RS
>    * <li>Region of a table that should be disabled was opened by a now-dead RS
>    * </ul>
>    * <p>
>    * <b>ZK State:  NONE</b>
>    * <p>A region could not have a transition node if</p>
>    * <ul>
>    * <li>The server hosting the region died and no master processed it
>    * </ul>
>    * <p>We will mock the scenarios</p>
>    * <ul>
>    * <li>Region of enabled table was on a dead RS that was not yet processed
>    * <li>Region of disabled table was on a dead RS that was not yet processed
>    * </ul>
>    * @throws Exception
>    */
> 
> 
> This addresses bug HBASE-2700.
>     http://issues.apache.org/jira/browse/HBASE-2700
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
> 1005264 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1005264 
>   trunk/src/main/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java 
> 1005264 
>   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1005264 
>   trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 
> 1005264 
>   trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 1005264 
>   trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
> 1005264 
> 
> Diff: http://review.cloudera.org/r/995/diff
> 
> 
> Testing
> -------
> 
> running the unit test!
> 
> 
> Thanks,
> 
> Jonathan
> 
>

Reply via email to