-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1087/
-----------------------------------------------------------
(Updated 2010-10-25 23:25:36.390570)
Review request for hbase and stack.
Changes
-------
So, a few things extra after digging in w/ Jon.
1. A watch was not being called on .META. move because it was not being set; in
MetaNodeTracker we were not calling the super inside in nodeDeleted to reset
the watch (in rolling restart, only a few servers would actually experience a
moved .META. sensation and it was these that were hanging up.. Others when
they came up would see .META. in its new location)
2. We were not assigning out .META. if master had trouble reaching meta before
it saw server expired. In the case where we'd trouble contacting meta before
we saw its server expire, we'd reset in the catalog tracker its location. We
were using catalog tracker to determine which server was hosting meta. We use
a different technique now.
Summary
-------
Adds new handling of the timeouts for PENDING_OPEN and PENDING_CLOSE in-memory
master RIT states.
Adds some new broken RIT states into TestMasterFailover.
Some of these broken states don't seem possible to me but as long as we aren't
breaking the existing behaviors and tests I think it's okay if we handle odd
cases that can be mocked. Who knows what will happen in the real world.
The reason TestMasterFailover didn't/doesn't really test for the issue in
HBASE-3147 is this new broken condition happens when an RS dies / goes offline
rather than a master failover concurrent w/ RS failure.
v4 of the patch adds to Jons' fixes. It adds a shutdown server handler for
root and another for meta so the processing of servers hosting meta/root do not
get frozen out. I've seen this in my testing.
This addresses bug HBASE-3147.
http://issues.apache.org/jira/browse/HBASE-3147
Diffs (updated)
-----
trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1027351
trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1027351
trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1027351
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
PRE-CREATION
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
1027351
trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1027351
trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
1027351
Diff: http://review.cloudera.org/r/1087/diff
Testing
-------
TestMasterFailover passes.
Thanks,
Jonathan