[
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhihong Yu updated HBASE-5100:
------------------------------
Attachment: (was: 5100.txt)
> Rollback of split could cause closed region to be opened again
> --------------------------------------------------------------
>
> Key: HBASE-5100
> URL: https://issues.apache.org/jira/browse/HBASE-5100
> Project: HBase
> Issue Type: Bug
> Reporter: chunhui shen
> Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0
>
> Attachments: hbase-5100.patch
>
>
> If master sending close region to rs and region's split transaction
> concurrently happen,
> it may cause closed region to opened.
> See the detailed code in SplitTransaction#createDaughters
> {code}
> List<StoreFile> hstoreFilesToSplit = null;
> try{
> hstoreFilesToSplit = this.parent.close(false);
> if (hstoreFilesToSplit == null) {
> // The region was closed by a concurrent thread. We can't continue
> // with the split, instead we must just abandon the split. If we
> // reopen or split this could cause problems because the region has
> // probably already been moved to a different server, or is in the
> // process of moving to a different server.
> throw new IOException("Failed to close region: already closed by " +
> "another thread");
> }
> } finally {
> this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
> }
> {code}
> when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes
> this.parent.initialize();
> Although this region is not onlined in the regionserver, it may bring some
> potential problem.
> For example, in our environment, the closed parent region is rolled back
> sucessfully , and then starting compaction and split again.
> The parent region is f892dd6107b6b4130199582abc78e9c1
> master log
> {code}
> 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance
> hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
> src=dw87.kgb.sqa.cm4,60020,1324827866085,
> dest=dw80.kgb.sqa.cm4,60020,1324827865780
> 2011-12-26 00:24:42,693 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
> region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> (offlining)
> 2011-12-26 00:24:42,694 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to
> serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0,
> usedHeap=0, maxHeap=0) for region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:42,699 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
> node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1
> (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
> server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
> 2011-12-26 00:24:42,699 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085,
> region=f892dd6107b6b4130199582abc78e9c1
> 2011-12-26 00:24:45,348 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085,
> region=f892dd6107b6b4130199582abc78e9c1
> 2011-12-26 00:24:45,349 DEBUG
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED
> event for f892dd6107b6b4130199582abc78e9c1
> 2011-12-26 00:24:45,349 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> state=CLOSED, ts=1324830285347
> 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x13447f283f40e73 Creating (or updating) unassigned node for
> f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
> 2011-12-26 00:24:45,354 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:60000,
> region=f892dd6107b6b4130199582abc78e9c1
> 2011-12-26 00:24:45,354 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> destination server is + serverName=dw80.kgb.sqa.cm4,60020,1324827865780,
> load=(requests=0, regions=1, usedHeap=0, maxHeap=0)
> 2011-12-26 00:24:45,354 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for
> region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.;
>
> plan=hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
> src=dw87.kgb.sqa.cm4,60020,1324827866085,
> dest=dw80.kgb.sqa.cm4,60020,1324827865780
> 2011-12-26 00:24:45,354 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> to dw80.kgb.sqa.cm4,60020,1324827865780
> 2011-12-26 00:24:46,899 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> on dw80.kgb.sqa.cm4,60020,1324827865780
> {code}
> RE_dw87 log
> {code}
> 2011-12-26 00:24:42,694 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region:
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:42,694 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
> close of
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x43447f30cb31367 Creating unassigned node for
> f892dd6107b6b4130199582abc78e9c1 in a CLOSING state
> 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Closing
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.:
> disabling compactions & flushes
> 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> waiting for compaction to complete for region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> aborted compaction on region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> after 49sec
> 2011-12-26 00:24:43,340 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Running close preflush of
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:43,340 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
> current region memstore size 59.5m
> 2011-12-26 00:24:43,340 INFO
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of
> region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:45,347 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Closed
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:45,347 WARN org.apache.hadoop.hbase.regionserver.HRegion:
> Region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> already closed
> 2011-12-26 00:24:45,347 WARN
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of
> failed split of
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.;
> Failed to close region: already closed by another thread
> 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x43447f30cb31367 Successfully transitioned node
> f892dd6107b6b4130199582abc78e9c1 from RS_ZK_REGION_CLOSING to
> RS_ZK_REGION_CLOSED
> 2011-12-26 00:24:45,348 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed
> region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:46,837 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Onlined
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.;
> next sequenceid=717341809
> 2011-12-26 00:24:46,841 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Successful rollback
> of failed split of
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:24:46,841 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:25:23,288 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> completed compaction on region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> after 36sec
> 2011-12-26 00:25:23,288 INFO
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of
> region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> 2011-12-26 00:25:24,847 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Offlined parent region
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
> in META
> 2011-12-26 00:25:26,165 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region split, META
> updated, and report to master.
> Parent=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
> new regions:
> writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324830323288.595f458507cecf208640cb4a1be8e293.,
>
> writetest,DHZX0CD7A4OE1KRILMWEJL2HTBN6OJVSFOKGU0P938DJ1M44B79C068NCZPXK1Z5OD2RQJ6LMA41TC0D44H05525TO3AHLZ4BZXX,1324830323288.ba9376c83327c34c7926fccb68c3b9e3..
> Split took 2sec
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira