Reopening a region on the a RS can leave it in PENDING_OPEN
-----------------------------------------------------------
Key: HBASE-5086
URL: https://issues.apache.org/jira/browse/HBASE-5086
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Fix For: 0.92.1
I got this twice during the same test.
If the region servers are slow enough and you run an online alter, it's
possible for the RS to change the znode status to CLOSED and have the master
send an OPEN before the region server is able to remove the region from it's
list of RITs.
This is what the master sees:
{quote}
011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Starting unassignment of region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. (offlining)
2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:62003-0x134589d3db033f7 Creating unassigned node for
43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Sent CLOSE to sv4r25s44,62023,1324494325099 for region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099,
region=43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED
event for 43123e2e3fc83ec25fe2a76b4f09077f
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Forcing OFFLINE;
was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for
43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Found an existing plan for
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination
server is + sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Using pre-existing plan for region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.;
plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., src=,
dest=sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
to sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,664 ERROR org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment in: sv4r25s44,62023,1324494325099 due to
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
Received:OPEN for the
region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which we
are already trying to CLOSE.
{quote}
After that the master abandons.
And the region server:
{quote}
2011-12-21 22:24:09,523 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region:
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,523 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.:
disabling compactions & flushes
2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Running close preflush of
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region
memstore size 40.5m
2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait
for mvcc, flushsize=42482936
2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
to
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72,
entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for
region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in
4856ms, sequenceid=31451012, compaction requested=true
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region
memstore size 218.9k
2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Finished snapshotting
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait
for mvcc, flushsize=224128
2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53
to
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53
2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store: Added
hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53,
entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k
2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms,
sequenceid=31451619, compaction requested=true
2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:62023-0x134589d3db03403 Attempting to transition node
43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:62023-0x134589d3db03403 Successfully transitioned node
43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
2011-12-21 22:24:15,670 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region
closed state in zk successfully for region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name:
sv4r25s44,62023,1324494325099
2011-12-21 22:24:15,670 DEBUG
org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region
test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
{quote}
Doing a force unassign in the shell fixes it.
A small-ish fix would be to add to RegionAlreadyInTransitionException which
state it's in so that we can detect this case and then just retry or open on
another region server.
This is critical for online altering to work, but I don't think it's likely to
happen in other situations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira