I think there is a problem in 0.90.6. Rolling restart seems broke.
Mistakenly I had previous RC out on cluster and had only updated the master.
My cluster would not start. The master would assign out -ROOT- but it
would fail to open on the regionserver with this:
2012-02-27 20:16:09,559 DEBUG
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Processing open of -ROOT-,,0.70236052
2012-02-27 20:16:09,561 DEBUG
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempting to transition node
70236052/-ROOT- from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:7003-0x135c07495b70002 Attempt to transition the
unassigned node for 70236052 from M_ZK_REGION_OFFLINE to
RS_ZK_REGION_OPENING failed, the node existed but was in the state
M_SERVER_SHUTDOWN set by the server sv4r11s38:7001
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
transition from OFFLINE to OPENING for region=70236052
2012-02-27 20:16:09,570 WARN
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Region
was hijacked? It no longer exists, encodedName=70236052
See how its thinking a state of M_ZK_REGION_OFFLINE is actually
M_SERVER_SHUTDOWN?
This seems to be because of this commit:
------------------------------------------------------------------------
r1244137 | tedyu | 2012-02-14 09:54:23 -0800 (Tue, 14 Feb 2012) | 3 lines
HBASE-5379 Backport HBASE-4287 to 0.90 - If region opening fails, try
to transition region back to
"offline" in ZK (Ram)
It does this:
Index: src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
(revision
1090348)
+++ src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java (working
copy)
@@ -107,6 +107,7 @@
RS_ZK_REGION_CLOSED (2), // RS has finished closing a region
RS_ZK_REGION_OPENING (3), // RS is in process of opening a region
RS_ZK_REGION_OPENED (4), // RS has finished opening a region
+ RS_ZK_REGION_FAILED_OPEN (5), // RS failed to open a region
// Messages originating from Master to RS
M_RS_OPEN_REGION (20), // Master asking RS to open a region
If you look at EventType in EventHandler, the constructor does nothing
w/ the passed value. Thats a problem. That means the enum is using
default ordinal and the addition of the above into middle of enums
shifts lower enums up one; M_ZK_REGION_OFFLINE is just before
M_SERVER_SHUTDOWN.
It looks like we need to back out HBASE-5379 from 0.90 branch and cut a new RC.
Does rolling restart work for you Ram?
St.Ack
On Sat, Feb 18, 2012 at 11:25 PM, rama krishna <[email protected]> wrote:
>
> Hi Devs
> The download of 0.90.6RC4 is available at
> http://people.apache.org/~ramkrishna/0.90.6RC4/
> The release has been signed by Stack as my key is not yet registered with
> web of trust.
> Regarding the new issues added to 0.90 after RC3 are
> HBASE-5377 Fix licenses on the 0.90 branch.
> HBASE-5379 Backport HBASE-4287 to 0.90 - If region opening fails, try to
> transition region back
> to "offline" in ZK
> HBASE-5396 Handle the regions in regionPlans while processing
> ServerShutdownHandler(Jieshan)Improvements HBASE-5327 Print a message when
> an invalid hbase.rootdir is passed (Jimmy Xiang)
> HBASE-5197 [replication] Handle socket timeouts in ReplicationSource
> to prevent DDOS
> HBASE-5395 CopyTable needs to use GenericOptionsParserI would like to
> freeze the check ins to 0.90 till this RC goes out of release.Please provide
> your votes on the release. The voting closes on 25th Feb.Hope to release out
> 0.90.6 before Feb ends.Thanks to all who contributed and looking forward for
> your support.
> RegardsRam
>
>
>