[ 
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010796#comment-13010796
 ] 

Jonathan Gray commented on HBASE-3669:
--------------------------------------

When I've seen this happen, there has been another RS cutting in and 
transferring to OPENING.

As someone in the other JIRA indicates, this kind of thing can happen when one 
of the RS is unable to open the region because it doesn't have the proper 
compression lib or some DFS error.

If the master successfully transfers to OFFLINE and the RS sees it as OPENING, 
then almost certainly there's another RS that has gotten in the way.

The contents of the RIT znode actually contains serverName, so we should 
probably add additional debug information when the state transfer fails.  
(Unable to go from OFFLINE to OPENING because already in OPENING by server 
#serverName#)

> Region in PENDING_OPEN keeps being bounced between RS and master
> ----------------------------------------------------------------
>
>                 Key: HBASE-3669
>                 URL: https://issues.apache.org/jira/browse/HBASE-3669
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> After going crazy killing region servers after HBASE-3668, most of the 
> cluster recovered except for 3 regions that kept being refused by the region 
> servers.
> One the master I would see:
> {code}
> 2011-03-17 22:23:14,828 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
>  state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_OPEN for too long, reassigning 
> region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> 2011-03-17 22:23:14,828 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
>  state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> was found (or we are ignoring an existing plan) for 
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
>  so generated a random one; 
> hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
>  src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null) 
> available servers
> 2011-03-17 22:23:14,828 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
>  to sv2borg171,60020,1300399357135
> {code}
> Then on the region server:
> {code}
> 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x22d627c142707d2 Attempting to transition node 
> f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode 
> /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21; 
> data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
>  server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned 
> node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_OPENING failed, the node existed but was in the state 
> RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
> {code}
> I'm not sure I fully understand what was going on... the master was suppose 
> to OFFLINE the znode but then that's not what the region server was seeing? 
> In any case, I was able to recover by doing a force unassign for each region 
> and then assign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to