[ https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans updated HBASE-5086: -------------------------------------- Summary: Reopening a region on a RS can leave it in PENDING_OPEN (was: Reopening a region on the a RS can leave it in PENDING_OPEN) > Reopening a region on a RS can leave it in PENDING_OPEN > ------------------------------------------------------- > > Key: HBASE-5086 > URL: https://issues.apache.org/jira/browse/HBASE-5086 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: Jean-Daniel Cryans > Fix For: 0.92.1 > > > I got this twice during the same test. > If the region servers are slow enough and you run an online alter, it's > possible for the RS to change the znode status to CLOSED and have the master > send an OPEN before the region server is able to remove the region from it's > list of RITs. > This is what the master sees: > {quote} > 011-12-21 22:24:09,498 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > (offlining) > 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db033f7 Creating unassigned node for > 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state > 2011-12-21 22:24:09,524 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > sv4r25s44,62023,1324494325099 for region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, > region=43123e2e3fc83ec25fe2a76b4f09077f > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for 43123e2e3fc83ec25fe2a76b4f09077f > 2011-12-21 22:24:15,656 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for > 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination > server is + sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; > plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., > src=, dest=sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,663 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to > sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,664 ERROR > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: > sv4r25s44,62023,1324494325099 due to > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: > Received:OPEN for the > region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which > we are already trying to CLOSE. > {quote} > After that the master abandons. > And the region server: > {quote} > 2011-12-21 22:24:09,523 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,523 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: > disabling compactions & flushes > 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Running close preflush of > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current > region memstore size 40.5m > 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing > wait for mvcc, flushsize=42482936 > 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Renaming flushed file at > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72 > to > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72 > 2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: > Added > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72, > entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m > 2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in > 4856ms, sequenceid=31451012, compaction requested=true > 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Updates disabled for region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current > region memstore size 218.9k > 2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing > wait for mvcc, flushsize=224128 > 2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Renaming flushed file at > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53 > to > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53 > 2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store: > Added > hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53, > entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k > 2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms, > sequenceid=31451619, compaction requested=true > 2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > 2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x134589d3db03403 Attempting to transition node > 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to > RS_ZK_REGION_CLOSED > 2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:62023-0x134589d3db03403 Successfully transitioned node > 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to > RS_ZK_REGION_CLOSED > 2011-12-21 22:24:15,670 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region > closed state in zk successfully for region > test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name: > sv4r25s44,62023,1324494325099 > 2011-12-21 22:24:15,670 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed > region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. > {quote} > Doing a force unassign in the shell fixes it. > A small-ish fix would be to add to RegionAlreadyInTransitionException which > state it's in so that we can detect this case and then just retry or open on > another region server. > This is critical for online altering to work, but I don't think it's likely > to happen in other situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira