[ 
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341400#comment-14341400
 ] 

Lars Hofhansl commented on HBASE-12311:
---------------------------------------

Tested some more with KVs with 10, 100, and 10000 versions. The more version we 
have the more likely it becomes that we would seek past the next indexed key of 
the top scanner in the heap and hence keep the seek. So this appears to work 
fine with few and many versions, as well as few and many columns, and all cases 
it will estimate whether a SKIP or a SEEK is better.


> Version stats in HFiles?
> ------------------------
>
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12311-indexed-0.98.txt, 12311-v2.txt, 12311-v3.txt, 
> 12311.txt, CellStatTracker.java
>
>
> In HBASE-9778 I basically punted the decision on whether doing repeated 
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of 
> versions we've seen for any row/col combination and store these in the 
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions 
> (i.e. seek between columns is better) or not (in which case we'd issue 
> repeated next()'s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to