[ 
https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5086:
--------------------------------------

    Summary: Reopening a region on a RS can leave it in PENDING_OPEN  (was: 
Reopening a region on the a RS can leave it in PENDING_OPEN)
    
> Reopening a region on a RS can leave it in PENDING_OPEN
> -------------------------------------------------------
>
>                 Key: HBASE-5086
>                 URL: https://issues.apache.org/jira/browse/HBASE-5086
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.1
>
>
> I got this twice during the same test.
> If the region servers are slow enough and you run an online alter, it's 
> possible for the RS to change the znode status to CLOSED and have the master 
> send an OPEN before the region server is able to remove the region from it's 
> list of RITs.
> This is what the master sees:
> {quote}
> 011-12-21 22:24:09,498 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
> (offlining)
> 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db033f7 Creating unassigned node for 
> 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
> 2011-12-21 22:24:09,524 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r25s44,62023,1324494325099 for region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
> region=43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
> state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
> 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
> server is + sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
> plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., 
> src=, dest=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to 
> sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,664 ERROR 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: 
> sv4r25s44,62023,1324494325099 due to 
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
> Received:OPEN for the 
> region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which 
> we are already trying to CLOSE.
> {quote}
> After that the master abandons.
> And the region server:
> {quote}
> 2011-12-21 22:24:09,523 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,523 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
> disabling compactions & flushes
> 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Running close preflush of 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Started memstore flush for 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current 
> region memstore size 40.5m
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished snapshotting 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing 
> wait for mvcc, flushsize=42482936
> 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Renaming flushed file at 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
>  to 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
> 2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Added 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72,
>  entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
> 2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 
> 4856ms, sequenceid=31451012, compaction requested=true
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Updates disabled for region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Started memstore flush for 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current 
> region memstore size 218.9k
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished snapshotting 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing 
> wait for mvcc, flushsize=224128
> 2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Renaming flushed file at 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53
>  to 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53
> 2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Added 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53,
>  entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k
> 2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms, 
> sequenceid=31451619, compaction requested=true
> 2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x134589d3db03403 Attempting to transition node 
> 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to 
> RS_ZK_REGION_CLOSED
> 2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x134589d3db03403 Successfully transitioned node 
> 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to 
> RS_ZK_REGION_CLOSED
> 2011-12-21 22:24:15,670 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region 
> closed state in zk successfully for region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name: 
> sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,670 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> {quote}
> Doing a force unassign in the shell fixes it.
> A small-ish fix would be to add to RegionAlreadyInTransitionException which 
> state it's in so that we can detect this case and then just retry or open on 
> another region server.
> This is critical for online altering to work, but I don't think it's likely 
> to happen in other situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to