Regionservers waiting for ROOT while Master waiting for RegionServers
---------------------------------------------------------------------
Key: HBASE-3265
URL: https://issues.apache.org/jira/browse/HBASE-3265
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Critical
After a cluster disastrophe due to a disconnected switch, I ended up in a state
where the master was up with no region servers (see HBASE-3263). When I brought
the RS back up, because of the aforementioned bug, the master didn't get itself
into a happy state (internal datastructure had some null in it). So I killed
the master and started it again. Now, the master is in "Waiting for region
servers to check in" mode, and the region servers are in the following stack:
- locked <0x00002aaab1bda5d0> (a
org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
at
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:177)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:537)
at java.lang.Thread.run(Thread.java:619)
I imagine what happened is that the RS got through "tryReportForDuty" with the
old master, but the old master was unable to assign anything due to bad state.
So, when it crashed, all the RS were stuck in waitForRoot(), and when I brought
the new one up, no one was reporting for duty.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.