[
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl updated HBASE-12311:
----------------------------------
Attachment: 12311-v2.txt
Got another few minutes on this. Bit further now. Still not working at all.
Again just want to park it.
* collects standard deviation as well
* knows how to combine trackers from multiple files - including the standard
deviation correctly
The only part left, really, is to hook this up in SQM.
Once I get this working, I'll do some number and see whether this effort is
worth it.
> Version stats in HFiles?
> ------------------------
>
> Key: HBASE-12311
> URL: https://issues.apache.org/jira/browse/HBASE-12311
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
> Attachments: 12311-v2.txt, 12311.txt, CellStatTracker.java
>
>
> In HBASE-9778 I basically punted the decision on whether doing repeated
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of
> versions we've seen for any row/col combination and store these in the
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions
> (i.e. seek between columns is better) or not (in which case we'd issue
> repeated next()'s).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)