[
https://issues.apache.org/jira/browse/HBASE-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902844#comment-15902844
]
Jean-Marc Spaggiari commented on HBASE-17755:
---------------------------------------------
Don't we risk to generate very small new daughter regions?
> CellBasedKeyBlockIndexReader#midkey should exhaust search of the target
> middle key on skewed regions
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-17755
> URL: https://issues.apache.org/jira/browse/HBASE-17755
> Project: HBase
> Issue Type: Bug
> Components: HFile
> Reporter: Esteban Gutierrez
> Assignee: Esteban Gutierrez
>
> We have always been returning the middle key of the the block index
> regardless the distribution of the data on an HFile. A side effect of that
> approach is that when millions of rows share the same key its quite easy to
> run into a situation when the start key is equal to the middle key or when
> the end key is equal to the middle key making that HFile nearly impossible to
> split until enough data is written into the region and the middle key shifts
> to another row or when an operator uses a custom split point in order to
> split that region.
> Instead we should exhaust the search of the middle key in the block index in
> order to be able to split an HFile earlier when possible even if our edge
> case is to serve a region that could hold a single key with millions of
> versions of a row or with millions of qualifiers on the same row.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)