[
https://issues.apache.org/jira/browse/HBASE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068772#comment-15068772
]
Ted Yu commented on HBASE-14867:
--------------------------------
I wonder if RegionNormalizer should have knowledge of the history of each
region.
This would allow the normalizer to distinguish whether region R has been empty
since inception or, was populated with data but became empty due to data
cleaned up (due to TTL) later on.
> SimpleRegionNormalizer needs to have better heuristics to trigger merge
> operation
> ---------------------------------------------------------------------------------
>
> Key: HBASE-14867
> URL: https://issues.apache.org/jira/browse/HBASE-14867
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 1.2.0
> Reporter: Romil Choksi
>
> SimpleRegionNormalizer needs to have better heuristics to trigger merge
> operation. SimpleRegionNormalizer is not able to trigger a merge action if
> the table's smallest region has neighboring regions that are larger than
> table's average region size, whereas there are other smaller regions whose
> combined size is less than the average region size.
> For example,
> - Consider a table with six region, say r1 to r6.
> - Keep r1 as empty and create some data say, 100K rows of data for each of
> the regions r2, r3 and r4. Create smaller amount of data for regions r5 and
> r6, say about 27K rows of data.
> - Run the normalizer. Verify the number the regions for that table and also
> check the master log to see if any merge action was triggered as a result of
> normalization.
> In such scenario, it would be better to have a merge action triggered for
> those two smaller regions r5 and r6 even though either of them is not the
> smallest one
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)