[ 
https://issues.apache.org/jira/browse/ACCUMULO-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308840#comment-15308840
 ] 

Keith Turner commented on ACCUMULO-4314:
----------------------------------------

I ran test with the changes in 1.7 for this issue using the same file I was 
testing the changes for ACCUMULO-1124 with.  The total index size went from 
6.9M to 3.6M.  

{noformat}
$ accumulo rfile-info /accumulo/tables/2/default_tablet/A0000005.rf
Reading file: 
hdfs://localhost:10000/accumulo/tables/2/default_tablet/A0000005.rf
Locality group         : <DEFAULT>
    Start block          : 0
    Num   blocks         : 20,041
    Index level 1        : 4,140 bytes  1 blocks
    Index level 0        : 3,620,079 bytes  14 blocks
    First key            : um:d:385:%03;%01;10.30.170.244>>o>/2954%af; 
data:current [] 4611686019157309597 false
    Last key             : um:d:395:%03;%01;%ff; 
com.facebook>.www>s>/dialog/feed?app_id=90376669494... TRUNCATED data:current 
[] -6917529026891043602 false
    Num entries          : 24,299,468
    Column families      : [data]

Meta block     : BCFile.index
      Raw size             : 4 bytes
      Compressed size      : 12 bytes
      Compression type     : gz

Meta block     : RFile.index
      Raw size             : 4,258 bytes
      Compressed size      : 2,154 bytes
      Compression type     : gz
{noformat}

> Use statistics to choose better keys for RFile index
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4314
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4314
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>            Priority: Blocker
>             Fix For: 1.6.6, 1.7.2
>
>
> The commit for ACCUMULO-1124 makes two changes :
>   * Generates shorter keys that may not exist in data to place in RFile index
>   * Use statistics to make better choices about what keys to place in index.  
> These changes look for keys that are average or below and excludes large keys 
> (keys that are > 3 std dev).
> The change to generate shorter keys can not be made in 1.7.X and 1.6.X 
> because it would generate RFiles that may not work properly with older 1.6 
> and 1.7 versions.   However the changes to use statistics to pick better keys 
> could be made in 1.6 and 1.7. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to