[
https://issues.apache.org/jira/browse/HBASE-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708090#comment-13708090
]
stack commented on HBASE-8940:
------------------------------
I took a closer look at the logs. Something else is going on here. We load
the table, scan the table to ensure row count is as expected (which can only
succeed after all regions are online), we then print out the regions we think
make up the table, and then immediately after, we choose to unassign a region.
The merge fails because this unexpected unassign followed by an open elsewhere
has not completed by the time the merge goes to run. I cannot see why we do
the unassign. I see that the balancer is set to off for these tests. Any
clues [~jxiang]? Otherwise, I'm going to add in a 'why' string for unassigns,
etc.
Here is us listing all regions after successful verification of row count
followed by the immediate unassign:
{code}
...
2013-07-11 09:33:43,744 INFO [pool-1-thread-1]
regionserver.TestRegionMergeTransactionOnCluster(262): Regions after load:
{{ENCODED => efcb10dcfa250e31bfd50dc6c7049f32, NAME =>
'testWholesomeMerge,,1373535210124.efcb10dcfa250e31bfd50dc6c7049f32.', STARTKEY
=> '', ENDKEY =>
'testRow0020'},ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884},{{ENCODED
=> 3ffefd878a234031675de6b2c70b2ead, NAME =>
'testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.',
STARTKEY => 'testRow0020', ENDKEY =>
'testRow0040'},ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820},{{ENCODED
=> 96a2b66754838d21e88d31cbeb21ed41, NAME =>
'testWholesomeMerge,testRow0040,1373535210125.96a2b66754838d21e88d31cbeb21ed41.',
STARTKEY => 'testRow0040', ENDKEY =>
'testRow0060'},ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884},{{ENCODED
=> 71eb1ffffd93a0d6a9f5529a45922519, NAME =>
'testWholesomeMerge,testRow0060,1373535210125.71eb1ffffd93a0d6a9f5529a45922519.',
STARTKEY => 'testRow0060', ENDKEY =>
'testRow0080'},ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820},{{ENCODED
=> f0cc6807bbeb1b4731f81397710ead3d, NAME =>
'testWholesomeMerge,testRow0080,1373535210125.f0cc6807bbeb1b4731f81397710ead3d.',
STARTKEY => 'testRow0080', ENDKEY =>
'testRow0100'},ip-10-174-118-204.us-west-1.compute.internal,58569,1373535184726},{{ENCODED
=> bc7ff4e217795751fd2a9fc671ad33a4, NAME =>
'testWholesomeMerge,testRow0100,1373535210125.bc7ff4e217795751fd2a9fc671ad33a4.',
STARTKEY => 'testRow0100', ENDKEY =>
'testRow0120'},ip-10-174-118-204.us-west-1.compute.internal,58569,1373535184726},{{ENCODED
=> 7bd39de7c7cc8e46007e50d2e60e4e3d, NAME =>
'testWholesomeMerge,testRow0120,1373535210125.7bd39de7c7cc8e46007e50d2e60e4e3d.',
STARTKEY => 'testRow0120', ENDKEY =>
'testRow0140'},ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884},{{ENCODED
=> f6ca1923a2636032360186166784a35c, NAME =>
'testWholesomeMerge,testRow0140,1373535210125.f6ca1923a2636032360186166784a35c.',
STARTKEY => 'testRow0140', ENDKEY =>
'testRow0160'},ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820},{{ENCODED
=> cc4e5e073976cf056ea7b629f3eaeee6, NAME =>
'testWholesomeMerge,testRow0160,1373535210125.cc4e5e073976cf056ea7b629f3eaeee6.',
STARTKEY => 'testRow0160', ENDKEY =>
'testRow0180'},ip-10-174-118-204.us-west-1.compute.internal,58569,1373535184726},{{ENCODED
=> 95559e718b7b1dc87635f3bd474b9d36, NAME =>
'testWholesomeMerge,testRow0180,1373535210125.95559e718b7b1dc87635f3bd474b9d36.',
STARTKEY => 'testRow0180', ENDKEY =>
''},ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884}
2013-07-11 09:33:43,751 DEBUG [pool-1-thread-1] client.ClientScanner(195):
Finished region={ENCODED => 1028785192, NAME => '.META.,,1', STARTKEY => '',
ENDKEY => ''}
2013-07-11 09:33:43,767 DEBUG
[MASTER_TABLE_OPERATIONS-ip-10-174-118-204:39405-0]
master.AssignmentManager(2205): Starting unassign of
testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
(offlining)
...
{code}
> TestRegionMergeTransactionOnCluster#testWholesomeMerge may fail due to race
> in opening region
> ---------------------------------------------------------------------------------------------
>
> Key: HBASE-8940
> URL: https://issues.apache.org/jira/browse/HBASE-8940
> Project: HBase
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 8940-v1.txt
>
>
> From
> http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/395/testReport/org.apache.hadoop.hbase.regionserver/TestRegionMergeTransactionOnCluster/testWholesomeMerge/
> :
> {code}
> 013-07-11 09:33:44,154 INFO [AM.ZK.Worker-pool-2-thread-2]
> master.RegionStates(309): Offlined 3ffefd878a234031675de6b2c70b2ead from
> ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820
> 2013-07-11 09:33:44,154 INFO [AM.ZK.Worker-pool-2-thread-2]
> master.AssignmentManager$4(1223): The master has opened
> testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
> that was online on
> ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> zookeeper.ZKAssign(862): regionserver:59210-0x13fcd13a20c0002 Successfully
> transitioned node 3ffefd878a234031675de6b2c70b2ead from RS_ZK_REGION_OPENING
> to RS_ZK_REGION_OPENED
> 2013-07-11 09:33:44,182 INFO
> [MASTER_TABLE_OPERATIONS-ip-10-174-118-204:39405-0]
> handler.DispatchMergingRegionHandler(154): Failed send MERGE REGIONS RPC to
> server ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884 for
> region
> testWholesomeMerge,,1373535210124.efcb10dcfa250e31bfd50dc6c7049f32.,testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.,
> focible=false, org.apache.hadoop.hbase.exceptions.RegionOpeningException:
> Region is being opened: 3ffefd878a234031675de6b2c70b2ead
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2566)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3862)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.mergeRegions(HRegionServer.java:3649)
> at
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:14400)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2124)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1831)
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> handler.OpenRegionHandler(373): region transitioned to opened in zookeeper:
> {ENCODED => 3ffefd878a234031675de6b2c70b2ead, NAME =>
> 'testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.',
> STARTKEY => 'testRow0020', ENDKEY => 'testRow0040'}, server:
> ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,183 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1]
> handler.OpenRegionHandler(186): Opened
> testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
> on server:ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> {code}
> We can see that MASTER_TABLE_OPERATIONS thread couldn't get region
> 3ffefd878a234031675de6b2c70b2ead because RS_OPEN_REGION thread finished
> region opening 1 millisecond later.
> One solution is to retry operation when receiving RegionOpeningException
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira