Reopening a region on the a RS can leave it in PENDING_OPEN
-----------------------------------------------------------

                 Key: HBASE-5086
                 URL: https://issues.apache.org/jira/browse/HBASE-5086
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.92.0
            Reporter: Jean-Daniel Cryans
             Fix For: 0.92.1


I got this twice during the same test.

If the region servers are slow enough and you run an online alter, it's 
possible for the RS to change the znode status to CLOSED and have the master 
send an OPEN before the region server is able to remove the region from it's 
list of RITs.

This is what the master sees:

{quote}
011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. (offlining)
2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:62003-0x134589d3db033f7 Creating unassigned node for 
43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to sv4r25s44,62023,1324494325099 for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
region=43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
event for 43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Found an existing plan for 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
server is + sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., src=, 
dest=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
to sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,664 ERROR org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment in: sv4r25s44,62023,1324494325099 due to 
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
Received:OPEN for the 
region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which we 
are already trying to CLOSE.
{quote}

After that the master abandons.

And the region server:
{quote}
2011-12-21 22:24:09,523 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,523 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
disabling compactions & flushes
2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Running close preflush of 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region 
memstore size 40.5m
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Finished snapshotting 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait 
for mvcc, flushsize=42482936
2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
Renaming flushed file at 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
 to 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72,
 entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for 
region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 
4856ms, sequenceid=31451012, compaction requested=true
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Updates disabled for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memstore flush for 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region 
memstore size 218.9k
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Finished snapshotting 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait 
for mvcc, flushsize=224128
2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
Renaming flushed file at 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53
 to 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53
2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53,
 entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k
2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms, 
sequenceid=31451619, compaction requested=true
2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x134589d3db03403 Attempting to transition node 
43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x134589d3db03403 Successfully transitioned node 
43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2011-12-21 22:24:15,670 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region 
closed state in zk successfully for region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name: 
sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,670 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region 
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
{quote}

Doing a force unassign in the shell fixes it.

A small-ish fix would be to add to RegionAlreadyInTransitionException which 
state it's in so that we can detect this case and then just retry or open on 
another region server.

This is critical for online altering to work, but I don't think it's likely to 
happen in other situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to