[
https://issues.apache.org/jira/browse/HBASE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
chunhui shen updated HBASE-8287:
--------------------------------
Attachment: hbase-trunk-8287.patch
The root cause of bug:
In DispatchMergingRegionHandler#process
{code}
region_b_location = masterServices.getAssignmentManager()
.getRegionStates().getRegionServerOfRegion(region_b);
onSameRS = region_a_location.equals(region_b_location);
if (onSameRS || !regionStates.isRegionInTransition(region_b)) {
// Regions are on the same RS, or region_b is not in
// RegionInTransition any more
break;
}
{code}
The value of onSameRS will be false even if regions are already on the same
regionserver as the following case:
1.getRegionServerOfRegion(region_b)
2.region_b online
3.isRegionInTransition(region_b)
The above steps is synchronized by RegionStates.
Step 1 return the old server of region_b, thus the value of onSameRS is false;
Step 2 return false since region_b is online after step 2.
Finally, the while block is break and onSameRS is false but the merging regions
are on the same regionserver in fact
> TestRegionMergeTransactionOnCluster failed in trunk build #4010
> ---------------------------------------------------------------
>
> Key: HBASE-8287
> URL: https://issues.apache.org/jira/browse/HBASE-8287
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: chunhui shen
> Assignee: chunhui shen
> Priority: Minor
> Fix For: 0.95.2, 0.98.0
>
> Attachments: hbase-trunk-8287.patch
>
>
> From the log of trunk build #4010:
> {code}
> 2013-04-04 05:45:59,396 INFO
> [MASTER_TABLE_OPERATIONS-quirinus.apache.org,53514,1365054344859-0]
> handler.DispatchMergingRegionHandler(157):
> Cancel merging regions
> testCleanMergeReference,,1365054353296.bf3d60360122d6c83a246f5f96c2cdd1.,
> testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.,
>
> because can't move them together after 842ms
>
> 2013-04-04 05:45:59,396 INFO [hbase-am-zkevent-worker-pool-2-thread-1]
> master.AssignmentManager$4(1164):
> The master has opened the region
> testCleanMergeReference,testRow0020,1365054353302.72fbc04566e78aa6732531296256a5aa.
> that was onlin
> e on quirinus.apache.org,45718,1365054345790
> {code}
> There's a small probability that fail to move merging regions together to
> same regionserver
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira