[ 
https://issues.apache.org/jira/browse/ACCUMULO-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241194#comment-13241194
 ] 

Keith Turner commented on ACCUMULO-501:
---------------------------------------

Actually RFile already stores this info for each index entry, its just a matter 
of using it.  Would be good to piggy back this computation on scan of the index 
bulk import is already doing, or have bulk import cache the index if multiple 
scans are done.  If the inner nodes of the index tree contain the sum of their 
children, then the computation can be made faster.
                
> RFile should store the key count in metadata
> --------------------------------------------
>
>                 Key: ACCUMULO-501
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-501
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> BulkImport estimates the number of keys in a file to be zero.  We store the 
> largest and smallest key in metadata, I think we can afford to store the key 
> count use it to provide an estimate when we load it into the tablet.  Perhaps 
> if we know the start key is "a" and the end key is "z" and the tablets range 
> is "a->m" we can just estimate 50% of the key count.
> When a bulk file fits completely in a range, the key count estimate will be 
> accurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to