Regionservers waiting for ROOT while Master waiting for RegionServers
---------------------------------------------------------------------

                 Key: HBASE-3265
                 URL: https://issues.apache.org/jira/browse/HBASE-3265
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: Todd Lipcon
            Priority: Critical


After a cluster disastrophe due to a disconnected switch, I ended up in a state 
where the master was up with no region servers (see HBASE-3263). When I brought 
the RS back up, because of the aforementioned bug, the master didn't get itself 
into a happy state (internal datastructure had some null in it). So I killed 
the master and started it again. Now, the master is in "Waiting for region 
servers to check in" mode, and the region servers are in the following stack:

        - locked <0x00002aaab1bda5d0> (a 
org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:177)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:537)
        at java.lang.Thread.run(Thread.java:619)

I imagine what happened is that the RS got through "tryReportForDuty" with the 
old master, but the old master was unable to assign anything due to bad state. 
So, when it crashed, all the RS were stuck in waitForRoot(), and when I brought 
the new one up, no one was reporting for duty.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to