[
https://issues.apache.org/jira/browse/HBASE-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924396#action_12924396
]
Jonathan Gray commented on HBASE-3147:
--------------------------------------
Seems that our in-memory RIT state is PENDING_OPEN but it's in OFFLINE in ZK.
Seems like a potentially common case. Server being assigned to was just not
there, never began opening it.
We should probably differentiate between PENDING_OPEN timeout and OPENING
timeout. Let me see what I find in the code.
(Your paste seems to lack line breaks so this jira is a mile wide)
> Regions stuck in transition after rolling restart, perpetual timeout handling
> but nothing happens
> -------------------------------------------------------------------------------------------------
>
> Key: HBASE-3147
> URL: https://issues.apache.org/jira/browse/HBASE-3147
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.90.0
>
>
> The rolling restart script is great for bringing on the weird stuff. On my
> little loaded cluster if I run it, it horks the cluster and it doesn't
> recover. I notice two issues that need fixing:
> 1. We'll miss noticing that a server was carrying .META. and it never gets
> assigned -- the shutdown handlers get stuck in perpetual wait on a .META.
> assign that will never happen.
> 2. Perpetual cycling of the this sequence per region not succesfully assigned:
> {code}
> 2010-10-23 21:37:57,404 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed
> out: usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
> state=PENDING_OPEN, ts=1287869814294 45154 2010-10-23
> 21:37:57,404 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region
> has been PENDING_OPEN or OPENING for too long, reassigning
> region=usertable,user510588360,1287547556587.
> 7f2d92497d2d03917afd574ea2aca55b. 45155 2010-10-23 21:37:57,404 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x2bd57d1475046a
> Attempting to transition node 7f2d92497d2d03917afd574ea2aca55b from
> RS_ZK_REGION_OPENING to M_ZK_REGION_OFFLINE 45156 2010-10-23 21:37:57,404
> WARN org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x2bd57d1475046a Attempt to transition the unassigned node for
> 7f2d92497d2d03917afd574ea2aca55b from RS_ZK_REGION_OPENING to
> M_ZK_REGION_OFFLINE failed, the node existed but was in the state
> M_ZK_REGION_OFFLINE 45157 2010-10-23 21:37:57,404 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region transitioned OPENING
> to OFFLINE so skipping timeout,
> region=usertable,user510588360,1287547556587.7f2d92497d2d03917afd574ea2aca55b.
>
> ,,,
> {code}
> Timeout period again elapses an then same sequence.
> This is what I've been working on.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.