[ 
https://issues.apache.org/jira/browse/ACCUMULO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298266#comment-15298266
 ] 

Keith Turner commented on ACCUMULO-1124:
----------------------------------------

One thing I thought about but did not get to was making rfile-info print some 
stats about the index.  Can already calculate the average key size with the 
info that rfile info prints out (using num blocks and total index size).  For 
the histogram option we could print stats and histogram for index and all data. 
  Having the histogram information + stats for all keys and index keys would be 
really nice for comparing the index to all of the data in the file.

I suspect that before this change larger keys may have had a higher chance of 
ending up in the index. Before this change when a data block exceeded the size 
it would take the last key in the data block and put it in the index.   Larger 
keys would push data blocks over the threshold.  Making rfile-info print out 
these index vs data stats would show this for older files.  Maybe I can add 
that to rfile-info in the PR.

> optimize index size in RFile
> ----------------------------
>
>                 Key: ACCUMULO-1124
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1124
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Keith Turner
>             Fix For: 1.8.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> I noticed HBASE-7845 and it seems like something we could do in RFile, too.
> Instead of putting the whole key in the index, you put in enough of the key 
> to get the reader to the beginning of the block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to