[
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180065#comment-14180065
]
Andrew Purtell commented on HBASE-12311:
----------------------------------------
Well we had HBASE-7958 but it fizzled out. One issue seemed to be maintaining a
stats table duplicates metrics reporting and metrics aggregation/history that
will already be in place externally (?). So I proposed
https://issues.apache.org/jira/browse/HBASE-7958?focusedCommentId=13997314&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13997314
not surfacing stats calculated when processing HFiles into a system table but
instead keep them as internal metadata that HFile/HStore could get at. The
proposal was "maintain a tree of statistic files in HDFS" but this information
could be embedded in HFiles themselves. The information there could also be
exported to the metrics subsystem. Should we revive that issue? Although per
block HFile statistics is something new I think.
> Version stats in HFiles?
> ------------------------
>
> Key: HBASE-12311
> URL: https://issues.apache.org/jira/browse/HBASE-12311
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
>
> In HBASE-9778 I basically punted the decision on whether doing repeated
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of
> versions we've seen for any row/col combination and store these in the
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions
> (i.e. seek between columns is better) or not (in which case we'd issue
> repeated next()'s).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)