-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/915/#review1349
-----------------------------------------------------------


Overall this looks like a good improvement over what we had.  I'm still a 
little confused about isRunningCluster (or isProperRunningCluster per comments).

Repeat from inline comments, but, is there ever a time a single region is 
deployed and we don't want to trigger the failover codepath?

Isn't the case we're really protecting against here that the cluster was not 
shutdown properly so the cluster status flag is up when it shouldn't be?

And does this handle case that cluster is killed quickly and then restarted 
again so the master ephemeral node is actually still there?  Then the RS will 
have master node and cluster up node and startup but potentially without a real 
master?


trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<http://review.cloudera.org/r/915/#comment4482>

    Why is this an "implementation"?  Doesn't the HRI represent the actual 
connection object?  I get that it's an implementation of HRI but normally that 
would be used in class names implementing?  No biggie, should just be 
consistent and seems a weird name to me (I think I was referring to this stuff 
as "connection" elsewhere in the class in method names/variable names)



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4481>

    Is this really the exception we want to throw (commons.lang)?  Or this is 
just short-term temporary?



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4483>

    yay thanks



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4484>

    So case that we are adding for here (but just throwing exception for now) 
is master came up, did not think it was fresh cluster (because cluster status 
flag in zk up? maybe note in comments above?), but we determine the cluster was 
not running because ROOT and META are not assigned.
    
    What about case where other regions are assigned?  Should this check 
actually be whether _any_ regions are assigned?  I think we discussed this, and 
I think looking for root/meta covers most cases, but maybe add a TODO?
    
    Though, even in failover case, we'll need to handle ROOT/META not being 
properly assigned, so if _any_ regions are assigned we would trigger failover, 
if no regions assigned we would assume it actually is a cluster startup and go 
into the branch of code which currently throws the exception.



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4485>

    javadoc about what this method does to determine if it's running cluster



trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.cloudera.org/r/915/#comment4486>

    So this method would be "proper running cluster"?
    
    Isn't it the case that if a single region is deployed anywhere we are not 
in startup, we are failover?



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.cloudera.org/r/915/#comment4487>

    looks good


- Jonathan


On 2010-09-28 15:52:46, Jonathan Gray wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/915/
> -----------------------------------------------------------
> 
> (Updated 2010-09-28 15:52:46)
> 
> 
> Review request for hbase, stack and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> This is patch from Stack, just putting up on rb.
> 
> M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
>   Add test of case where HRegionInterface connection throws a
>   ConnectionException. Also tests two new verify root and meta 
>   locations added to CatalogTracker.
> M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
>   Change order in which we start up trackers in ZK.  Also add blocking
>   until master is up to make it less likely we'll start before master
>   comes up, especially around the cluster start up situation.
> M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
>   Introduce new state on startup, the case where the cluster is
>   NOT a fresh startup and its NOT a cluster where all is fully
>   assigned.  The repair the master needs run to fixup this new
>   state is not yet done; we throw a NotImplementedException for
>   now.  TODO.  Added new isRunningCluster checker used figuring
>   what the cluster condition is when master is joining.  Not
>   comprehensive but good enough for now.
> M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
>   Javadoc.
>   Added new verifyRootRegionLocation and verifyMetaRegionLocation.
>   Needed to verify whats in zk is actually locations of catalog
>   regions.
> M src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
>   Add fact that the verifying method, getRegionInfo, can throw
>   ConnectException
> 
> 
> This addresses bug HBASE-3047.
>     http://issues.apache.org/jira/browse/HBASE-3047
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
> 1002359 
>   trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 
> 1002359 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1002359 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
> 1002359 
>   trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 
> 1002359 
> 
> Diff: http://review.cloudera.org/r/915/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jonathan
> 
>

Reply via email to