[jira] Commented: (HBASE-1071) Set index interval at flush time based off count of keys and key attributes

Andrew Purtell (JIRA) Sat, 20 Dec 2008 04:55:09 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658282#action_12658282
 ]


Andrew Purtell commented on HBASE-1071:
---------------------------------------

One way to approach this is to estimate the size of the index on the heap by 
key count and lengths. Then consider a certain limit, and increase the index 
interval as necessary until the estimated index size is below threshold. This 
is simple and gives only one knob -- easy enough to tweak -- that gets directly 
to the effect wanted.  Then a suitable default can be found through testing of 
some educated guesses with PE.

> Set index interval at flush time based off count of keys and key attributes
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1071
>                 URL: https://issues.apache.org/jira/browse/HBASE-1071
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> From Andrew Purtell note up on list:
> "Later, maybe it would make sense to dynamically set the index
> interval based on the distribution of cell sizes in the 
> mapfile at some future time, according to some parameterized
> formula that could be adjusted with config variable(s). This
> could be done during compaction. Would make sense to also
> consider the distribution of key lengths. Or there could be
> other similar tricks implemented to keep index sizes down. "

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1071) Set index interval at flush time based off count of keys and key attributes

Reply via email to