[
https://issues.apache.org/jira/browse/ACCUMULO-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311175#comment-15311175
]
Keith Turner commented on ACCUMULO-4314:
----------------------------------------
Another important change to note is the change in depth of the index tree. In
the original file the tree was 4 levels. After running with these changes its
only 2 levels. Having less levels is not just a function of the total index
size. The larger keys tend to make the index tree deeper. Avoiding adding
larger keys to the index avoids this problem.
> Use statistics to choose better keys for RFile index
> ----------------------------------------------------
>
> Key: ACCUMULO-4314
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4314
> Project: Accumulo
> Issue Type: Improvement
> Reporter: Keith Turner
> Assignee: Keith Turner
> Priority: Blocker
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>
> The commit for ACCUMULO-1124 makes two changes :
> * Generates shorter keys that may not exist in data to place in RFile index
> * Use statistics to make better choices about what keys to place in index.
> These changes look for keys that are average or below and excludes large keys
> (keys that are > 3 std dev).
> The change to generate shorter keys can not be made in 1.7.X and 1.6.X
> because it would generate RFiles that may not work properly with older 1.6
> and 1.7 versions. However the changes to use statistics to pick better keys
> could be made in 1.6 and 1.7.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)