[
https://issues.apache.org/jira/browse/HBASE-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721935#comment-13721935
]
Jimmy Xiang edited comment on HBASE-8940 at 7/28/13 2:09 PM:
-------------------------------------------------------------
The above failed test is caused by
{noformat}
2013-07-27 20:08:48,345 INFO [MASTER_TABLE_OPERATIONS-quirinus:59057-0]
handler.DispatchMergingRegionHandler(170): Cancel merging regions
testWholesomeMerge,,1374955684854.8b452e80a9e15d54bd265c344f4ad953.,
testWholesomeMerge,testRow002
0,1374955684854.4a31455f2b0256853c41c52ba65bdc10., because can't move them
together after 30003ms
{noformat}
Timeout of moving regions together on same RS is caused by closing region
"4a31455f2b0256853c41c52ba65bdc10" take more time than 30s,
{noformat}
2013-07-27 20:08:18,753 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
handler.CloseRegionHandler(125): Processing close of
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
2013-07-27 20:08:18,755 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
regionserver.HRegion(1493): Started memstore flush for
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.,
current region memstore size 3.4k
2013-07-27 20:09:03,914 INFO [RS_CLOSE_REGION-quirinus:57626-1]
regionserver.HRegion(1637): Finished memstore flush of ~3.4k/3520,
currentsize=0.0/0 for region
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc1
0. in 45159ms, sequenceid=186, compaction requested=false
2013-07-27 20:09:03,956 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
handler.CloseRegionHandler(177): Closed region
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
{noformat}
>From the above logs, closing region took 45s, it seems flushing is too slow,
>but can't get the reason from the current logs, maybe GC or high IO load at
>that time
was (Author: zjushch):
The above failed test is caused by
{format}
2013-07-27 20:08:48,345 INFO [MASTER_TABLE_OPERATIONS-quirinus:59057-0]
handler.DispatchMergingRegionHandler(170): Cancel merging regions
testWholesomeMerge,,1374955684854.8b452e80a9e15d54bd265c344f4ad953.,
testWholesomeMerge,testRow002
0,1374955684854.4a31455f2b0256853c41c52ba65bdc10., because can't move them
together after 30003ms
{format}
Timeout of moving regions together on same RS is caused by closing region
"4a31455f2b0256853c41c52ba65bdc10" take more time than 30s,
{format}
2013-07-27 20:08:18,753 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
handler.CloseRegionHandler(125): Processing close of
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
2013-07-27 20:08:18,755 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
regionserver.HRegion(1493): Started memstore flush for
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.,
current region memstore size 3.4k
2013-07-27 20:09:03,914 INFO [RS_CLOSE_REGION-quirinus:57626-1]
regionserver.HRegion(1637): Finished memstore flush of ~3.4k/3520,
currentsize=0.0/0 for region
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc1
0. in 45159ms, sequenceid=186, compaction requested=false
2013-07-27 20:09:03,956 DEBUG [RS_CLOSE_REGION-quirinus:57626-1]
handler.CloseRegionHandler(177): Closed region
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
{format}
>From the above logs, closing region took 45s, it seems flushing is too slow,
>but can't get the reason from the current logs, maybe GC or high IO load at
>that time
> TestRegionMergeTransactionOnCluster#testWholesomeMerge may fail due to race
> in opening region
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-8940
> URL: https://issues.apache.org/jira/browse/HBASE-8940
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: chunhui shen
> Fix For: 0.95.2
>
> Attachments: 8940-trunk-v2.patch, 8940-v1.txt, 8940v3.txt
>
>
> From
> http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/395/testReport/org.apache.hadoop.hbase.regionserver/TestRegionMergeTransactionOnCluster/testWholesomeMerge/
> :
> {code}
> 013-07-11 09:33:44,154 INFO [AM.ZK.Worker-pool-2-thread-2]
> master.RegionStates(309): Offlined 3ffefd878a234031675de6b2c70b2ead from
> ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820
> 2013-07-11 09:33:44,154 INFO [AM.ZK.Worker-pool-2-thread-2]
> master.AssignmentManager$4(1223): The master has opened
> testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
> that was online on
> ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> zookeeper.ZKAssign(862): regionserver:59210-0x13fcd13a20c0002 Successfully
> transitioned node 3ffefd878a234031675de6b2c70b2ead from RS_ZK_REGION_OPENING
> to RS_ZK_REGION_OPENED
> 2013-07-11 09:33:44,182 INFO
> [MASTER_TABLE_OPERATIONS-ip-10-174-118-204:39405-0]
> handler.DispatchMergingRegionHandler(154): Failed send MERGE REGIONS RPC to
> server ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884 for
> region
> testWholesomeMerge,,1373535210124.efcb10dcfa250e31bfd50dc6c7049f32.,testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.,
> focible=false, org.apache.hadoop.hbase.exceptions.RegionOpeningException:
> Region is being opened: 3ffefd878a234031675de6b2c70b2ead
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2566)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3862)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.mergeRegions(HRegionServer.java:3649)
> at
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:14400)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2124)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1831)
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> handler.OpenRegionHandler(373): region transitioned to opened in zookeeper:
> {ENCODED => 3ffefd878a234031675de6b2c70b2ead, NAME =>
> 'testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.',
> STARTKEY => 'testRow0020', ENDKEY => 'testRow0040'}, server:
> ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,183 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> handler.OpenRegionHandler(186): Opened
> testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
> on server:ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> {code}
> We can see that MASTER_TABLE_OPERATIONS thread couldn't get region
> 3ffefd878a234031675de6b2c70b2ead because RS_OPEN_REGION thread finished
> region opening 1 millisecond later.
> One solution is to retry operation when receiving RegionOpeningException
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira