[ 
https://issues.apache.org/jira/browse/HBASE-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900424#comment-15900424
 ] 

Esteban Gutierrez commented on HBASE-17755:
-------------------------------------------

I worked on this with [[email protected]] and we should follow up with 
additional improvements.

> CellBasedKeyBlockIndexReader#midkey should exhaust search of the target 
> middle key on skewed regions
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17755
>                 URL: https://issues.apache.org/jira/browse/HBASE-17755
>             Project: HBase
>          Issue Type: Bug
>          Components: HFile
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>
> We have always been returning the middle key of the the block index 
> regardless the distribution of the data on an HFile. A side effect of that 
> approach is that when millions of rows share the same key its quite easy to 
> run into a situation when the start key is equal to the middle key or when 
> the end key is equal to the middle key making that HFile nearly impossible to 
> split until enough data is written into the region and the middle key shifts 
> to another row or when an operator uses a custom split point in order to 
> split that region. 
> Instead we should exhaust the search of the middle key in the block index in 
> order to be able to split an HFile earlier when possible even if our edge 
> case is to serve a region that could hold a single key with millions of 
> versions of a row or with millions of qualifiers on the same row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to