[ 
https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176517#comment-13176517
 ] 

ramkrishna.s.vasudevan commented on HBASE-3741:
-----------------------------------------------

@Johnyang
The final fix is in HBASE-4186.  So you may have to upgrade to 0.90.5. 

                
> Make HRegionServer aware of the regions it's opening/closing
> ------------------------------------------------------------
>
>                 Key: HBASE-3741
>                 URL: https://issues.apache.org/jira/browse/HBASE-3741
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.3
>
>         Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, 
> HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch
>
>
> This is a serious issue about a race between regions being opened and closed 
> in region servers. We had this situation where the master tried to unassign a 
> region for balancing, failed, force unassigned it, force assigned it 
> somewhere else, failed to open it on another region server (took too long), 
> and then reassigned it back to the original region server. A few seconds 
> later, the region server processed the first closed and the region was left 
> unassigned.
> This is from the master log:
> {quote}
> 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
> Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, 
> regions=574, usedHeap=3918, maxHeap=6973) for region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:12:10,021 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  state=PENDING_CLOSE, ts=1302041477758
> 2011-04-05 15:12:10,021 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> ...
> 2011-04-05 15:14:45,783 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  state=CLOSED, ts=1302041685733
> 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x42ec2cece810b68 Creating (or updating) unassigned node for 
> 1470298961 with OFFLINE state
> ...
> 2011-04-05 15:14:45,885 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961;
>  
> plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
>  src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196
> 2011-04-05 15:14:45,885 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  to sv4borg40,60020,1302041218196
> 2011-04-05 15:15:39,410 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  state=PENDING_OPEN, ts=1302041700944
> 2011-04-05 15:15:39,410 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_OPEN for too long, reassigning 
> region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  state=PENDING_OPEN, ts=1302041700944
> ...
> 2011-04-05 15:15:39,410 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> was found (or we are ignoring an existing plan) for 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  so generated a random one; 
> hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
>  src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) 
> available servers
> 2011-04-05 15:15:39,410 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  to sv4borg42,60020,1300920459477
> 2011-04-05 15:15:40,951 DEBUG 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
> master:60000-0x42ec2cece810b68 Received ZooKeeper Event, 
> type=NodeDataChanged, state=SyncConnected, 
> path=/prodjobs/unassigned/1470298961
> 2011-04-05 15:15:40,952 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode 
> /prodjobs/unassigned/1470298961 and set watcher; 
> region=stumbles_by_userid2,'����穗���6,1266566087256, 
> server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_OPENED
> 2011-04-05 15:15:40,952 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENED, server=sv4borg42,60020,1300920459477, 
> region=1470298961
> 2011-04-05 15:15:42,222 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
> event for 1470298961; deleting unassigned node
> ...
> 2011-04-05 15:15:55,812 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:60000-0x42ec2cece810b68 Retrieved 93 byte(s) of data from znode 
> /prodjobs/unassigned/1470298961 and set watcher; 
> region=stumbles_by_userid2,'����穗���6,1266566087256, 
> server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING
> 2011-04-05 15:15:55,812 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
> node: /prodjobs/unassigned/1470298961 
> (region=stumbles_by_userid2,'����穗���6,1266566087256, 
> server=sv4borg42,60020,1300920459477, state=RS_ZK_REGION_CLOSING)
> 2011-04-05 15:15:55,812 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSING, server=sv4borg42,60020,1300920459477, 
> region=1470298961
> 2011-04-05 15:15:55,812 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSING for region 
> 1470298961 from server sv4borg42,60020,1300920459477 but region was in  the 
> state null and not in expected PENDING_CLOSE or CLOSING states
> {quote}
> And from sv4borg42:
> {quote}
> 2011-04-05 15:09:58,755 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:11:17,757 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:12:10,021 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,675 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,700 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Closing 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961:
>  disabling compactions & flushes
> 2011-04-05 15:14:45,701 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Updates disabled for region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,701 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closed 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:14:45,758 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed 
> region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
> region: 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,410 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
> open of 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:39,486 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Opening region: REGION => {NAME => 
> 'stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256',
>  STARTKEY => '\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6', 
> ENDKEY => '\x00'\x9AU\x7F\xFF\xFE\xEBQ\xB0\xC3\xEF\x00Jr\xF2', ENCODED => 
> 1470298961, TABLE => ...
> 2011-04-05 15:15:39,487 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Instantiated 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:40,399 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Onlined 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961;
>  next sequenceid=37627407247
> 2011-04-05 15:15:40,488 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
> Updated row 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  in region .META.,,1 with server=sv4borg42:60020, startcode=1300920459477
> 2011-04-05 15:15:40,582 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,776 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Closing 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961:
>  disabling compactions & flushes
> 2011-04-05 15:15:55,809 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Updates disabled for region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,809 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closed 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,842 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed 
> region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,943 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
> 2011-04-05 15:15:55,943 WARN 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Received 
> CLOSE for region 
> stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
>  but currently not serving
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to