[
https://issues.apache.org/jira/browse/HBASE-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964928#action_12964928
]
Jonathan Gray commented on HBASE-3265:
--------------------------------------
I dug into this one and had a hard time understanding what was preventing the
RS from heartbeating in. I'd need a stack dump to see what was up on that RS.
Otherwise not sure how to address this besides a larger overhaul of reconciling
our two data points for RS availability (ZK ephemeral nodes and RPC heartbeats).
> Regionservers waiting for ROOT while Master waiting for RegionServers
> ---------------------------------------------------------------------
>
> Key: HBASE-3265
> URL: https://issues.apache.org/jira/browse/HBASE-3265
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.90.0
>
>
> After a cluster disastrophe due to a disconnected switch, I ended up in a
> state where the master was up with no region servers (see HBASE-3263). When I
> brought the RS back up, because of the aforementioned bug, the master didn't
> get itself into a happy state (internal datastructure had some null in it).
> So I killed the master and started it again. Now, the master is in "Waiting
> for region servers to check in" mode, and the region servers are in the
> following stack:
> - locked <0x00002aaab1bda5d0> (a
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
> at
> org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:177)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:537)
> at java.lang.Thread.run(Thread.java:619)
> I imagine what happened is that the RS got through "tryReportForDuty" with
> the old master, but the old master was unable to assign anything due to bad
> state. So, when it crashed, all the RS were stuck in waitForRoot(), and when
> I brought the new one up, no one was reporting for duty.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.