-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/915/
-----------------------------------------------------------
(Updated 2010-09-28 23:31:22.975377)
Review request for hbase, stack and Jonathan Gray.
Changes
-------
Here, this should be more robust. Your comments should be addressed also. For
sure, AM#processFailover has holes -- e.g. what if a regionserver crashed while
new master was coming up -- but lets address that in another issue. Below are
notes on changes made since v1 of the patch.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
Change here was because saw a case where we hung for ever (my guess is that
remaining became equal to NO_TIMEOUT). Redid the logic here.
M src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java
Set this thread to be daemon. Have seen it hold up RS shutdowns.
M src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
Renamed the initialize method as createInitialFileSystemLayout, made it
private it and called it from constructor. Its idempotent, cheap, and no need
others should be concerned with these mechanics; encapsulate it.
M src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Removed freshClusterStartup flag. Now, let any 'unknown' server in and
register it UNLESS its a dead server (fixed up expiration so we add to dead
servers BEFORE we remove from online servers). Have waitForRegionServers
return count of regions out on cluster. This will be 0 if servers are coming
in with clean regionServerStartup but if they came in and were registered on a
regionServerReport, then they'll have a filled out HServerLoad with a count of
regions. Use count of regions as way to tell if regions out on cluster or not.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Removed freshClusterStartup. Added logging of state of cluster-up flag, and
# of regionservers out on cluster. Use count of regions out on cluster to
figure if we are to do assign of all user regions or if instead we are to do
process failover. Added splitting of WALs always and check and reassign of
root and meta whether fresh start up or failover.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
Added notes on holes in processFailover.
M src/main/resources/hbase-default.xml
Set checkin down from 5 to 3 seconds again.
Summary
-------
This is patch from Stack, just putting up on rb.
M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
Add test of case where HRegionInterface connection throws a
ConnectionException. Also tests two new verify root and meta
locations added to CatalogTracker.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Change order in which we start up trackers in ZK. Also add blocking
until master is up to make it less likely we'll start before master
comes up, especially around the cluster start up situation.
M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Introduce new state on startup, the case where the cluster is
NOT a fresh startup and its NOT a cluster where all is fully
assigned. The repair the master needs run to fixup this new
state is not yet done; we throw a NotImplementedException for
now. TODO. Added new isRunningCluster checker used figuring
what the cluster condition is when master is joining. Not
comprehensive but good enough for now.
M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
Javadoc.
Added new verifyRootRegionLocation and verifyMetaRegionLocation.
Needed to verify whats in zk is actually locations of catalog
regions.
M src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
Add fact that the verifying method, getRegionInfo, can throw
ConnectException
This addresses bug HBASE-3047.
http://issues.apache.org/jira/browse/HBASE-3047
Diffs (updated)
-----
trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1001981
trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1001981
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1001981
trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1001981
trunk/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
1001981
trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1001981
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
1001981
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Leases.java 1001981
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
1001981
trunk/src/main/resources/hbase-default.xml 1001981
trunk/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java
1001981
Diff: http://review.cloudera.org/r/915/diff
Testing
-------
Thanks,
Jonathan