[
https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-5086:
--------------------------------------
Summary: Reopening a region on a RS can leave it in PENDING_OPEN (was:
Reopening a region on the a RS can leave it in PENDING_OPEN)
> Reopening a region on a RS can leave it in PENDING_OPEN
> -------------------------------------------------------
>
> Key: HBASE-5086
> URL: https://issues.apache.org/jira/browse/HBASE-5086
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: Jean-Daniel Cryans
> Fix For: 0.92.1
>
>
> I got this twice during the same test.
> If the region servers are slow enough and you run an online alter, it's
> possible for the RS to change the znode status to CLOSED and have the master
> send an OPEN before the region server is able to remove the region from it's
> list of RITs.
> This is what the master sees:
> {quote}
> 011-12-21 22:24:09,498 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> (offlining)
> 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:62003-0x134589d3db033f7 Creating unassigned node for
> 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
> 2011-12-21 22:24:09,524 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> sv4r25s44,62023,1324494325099 for region
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,656 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099,
> region=43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED
> event for 43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for
> 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
> 2011-12-21 22:24:15,663 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination
> server is + sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.;
> plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.,
> src=, dest=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to
> sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,664 ERROR
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in:
> sv4r25s44,62023,1324494325099 due to
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
> Received:OPEN for the
> region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which
> we are already trying to CLOSE.
> {quote}
> After that the master abandons.
> And the region server:
> {quote}
> 2011-12-21 22:24:09,523 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region:
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,523 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
> close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.:
> disabling compactions & flushes
> 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Running close preflush of
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current
> region memstore size 40.5m
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing
> wait for mvcc, flushsize=42482936
> 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Renaming flushed file at
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
> to
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
> 2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store:
> Added
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72,
> entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
> 2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in
> 4856ms, sequenceid=31451012, compaction requested=true
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Updates disabled for region
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current
> region memstore size 218.9k
> 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing
> wait for mvcc, flushsize=224128
> 2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Renaming flushed file at
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53
> to
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53
> 2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store:
> Added
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53,
> entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k
> 2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms,
> sequenceid=31451619, compaction requested=true
> 2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:62023-0x134589d3db03403 Attempting to transition node
> 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to
> RS_ZK_REGION_CLOSED
> 2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:62023-0x134589d3db03403 Successfully transitioned node
> 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to
> RS_ZK_REGION_CLOSED
> 2011-12-21 22:24:15,670 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region
> closed state in zk successfully for region
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name:
> sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,670 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> {quote}
> Doing a force unassign in the shell fixes it.
> A small-ish fix would be to add to RegionAlreadyInTransitionException which
> state it's in so that we can detect this case and then just retry or open on
> another region server.
> This is critical for online altering to work, but I don't think it's likely
> to happen in other situations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira