[
https://issues.apache.org/jira/browse/HBASE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-3669:
--------------------------
Fix Version/s: (was: 0.92.0)
0.94.0
> Region in PENDING_OPEN keeps being bounced between RS and master
> ----------------------------------------------------------------
>
> Key: HBASE-3669
> URL: https://issues.apache.org/jira/browse/HBASE-3669
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.1
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.94.0
>
> Attachments: HBASE-3669-debug-v1.patch
>
>
> After going crazy killing region servers after HBASE-3668, most of the
> cluster recovered except for 3 regions that kept being refused by the region
> servers.
> One the master I would see:
> {code}
> 2011-03-17 22:23:14,828 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed
> out:
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been
> PENDING_OPEN for too long, reassigning
> region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> 2011-03-17 22:23:14,828 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> state=PENDING_OPEN, ts=1300400554826
> 2011-03-17 22:23:14,828 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan
> was found (or we are ignoring an existing plan) for
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> so generated a random one;
> hri=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
> src=, dest=sv2borg171,60020,1300399357135; 17 (online=17, exclude=null)
> available servers
> 2011-03-17 22:23:14,828 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.
> to sv2borg171,60020,1300399357135
> {code}
> Then on the region server:
> {code}
> 2011-03-17 22:23:14,829 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x22d627c142707d2 Attempting to transition node
> f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to
> RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> regionserver:60020-0x22d627c142707d2 Retrieved 166 byte(s) of data from znode
> /hbase/unassigned/f11849557c64c4efdbe0498f3fe97a21;
> data=region=supr_rss_items,ea0a3ac6c8779dab:872333599:ed1a7ad00f076fd98fcd3adcd98b62c6,1285707378709.f11849557c64c4efdbe0498f3fe97a21.,
> server=sv2borg180,60020,1300384550966, state=RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x22d627c142707d2 Attempt to transition the unassigned
> node for f11849557c64c4efdbe0498f3fe97a21 from M_ZK_REGION_OFFLINE to
> RS_ZK_REGION_OPENING failed, the node existed but was in the state
> RS_ZK_REGION_OPENING
> 2011-03-17 22:23:14,832 WARN
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> transition from OFFLINE to OPENING for region=f11849557c64c4efdbe0498f3fe97a21
> {code}
> I'm not sure I fully understand what was going on... the master was suppose
> to OFFLINE the znode but then that's not what the region server was seeing?
> In any case, I was able to recover by doing a force unassign for each region
> and then assign.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira