[
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179055#comment-14179055
]
Lars Hofhansl commented on HBASE-12311:
---------------------------------------
I would need to get it by columnFamily/HStore (because I want to avoid the
comparisons in SQM and down), on the blocks itself that'd be hard to do. It
would be similar to maxSequenceId, etc. Just another things in the HFile
trailer.
Still wondering whether there's a better way, though. It would be nice if the
upper layers could suggest a (re)seek and then at the Store (or maybe even
StoreFile) we could decide how execute that, problem there would be to avoid
multiple compares between the layers (which is the main reason why seeks are so
expensive even when the blocks are already in the cache)
> Version stats in HFiles?
> ------------------------
>
> Key: HBASE-12311
> URL: https://issues.apache.org/jira/browse/HBASE-12311
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
>
> In HBASE-9778 I basically punted the decision on whether doing repeated
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of
> versions we've seen for any row/col combination and store these in the
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions
> (i.e. seek between columns is better) or not (in which case we'd issue
> repeated next()'s).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)