[
https://issues.apache.org/jira/browse/HBASE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068916#comment-15068916
]
Ted Yu edited comment on HBASE-14867 at 12/23/15 12:00 AM:
-----------------------------------------------------------
Patch v2 uses Triple to record region info, region size and the original index
so that we can determine whether two candidate regions are adjacent or not.
This should cover the scenario Romil described.
After sorting, the regions would be in this order:
r1, r5, r6, r2, r3, r4
We would then find r5 and r6 to be merged.
Note: the relative order of r5 and r6, r2 and r3 and r4 may be different from
above. But r5 and r6, r2 and r3 and r4 would form two groups.
was (Author: [email protected]):
Patch v2 uses Triple to record region info, region size and the original index
so that we can determine whether two candidate regions are adjacent or not.
This should cover the scenario Romil described.
After sorting, the regions would be in this order:
r1, r5, r6, r2, r3, r4
We would then find r5 and r6 to be merged.
> SimpleRegionNormalizer needs to have better heuristics to trigger merge
> operation
> ---------------------------------------------------------------------------------
>
> Key: HBASE-14867
> URL: https://issues.apache.org/jira/browse/HBASE-14867
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 1.2.0
> Reporter: Romil Choksi
> Assignee: Ted Yu
> Attachments: 14867-v2.txt
>
>
> SimpleRegionNormalizer needs to have better heuristics to trigger merge
> operation. SimpleRegionNormalizer is not able to trigger a merge action if
> the table's smallest region has neighboring regions that are larger than
> table's average region size, whereas there are other smaller regions whose
> combined size is less than the average region size.
> For example,
> - Consider a table with six region, say r1 to r6.
> - Keep r1 as empty and create some data say, 100K rows of data for each of
> the regions r2, r3 and r4. Create smaller amount of data for regions r5 and
> r6, say about 27K rows of data.
> - Run the normalizer. Verify the number the regions for that table and also
> check the master log to see if any merge action was triggered as a result of
> normalization.
> In such scenario, it would be better to have a merge action triggered for
> those two smaller regions r5 and r6 even though either of them is not the
> smallest one
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)