[ 
https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180065#comment-14180065
 ] 

Andrew Purtell commented on HBASE-12311:
----------------------------------------

Well we had HBASE-7958 but it fizzled out. One issue seemed to be maintaining a 
stats table duplicates metrics reporting and metrics aggregation/history that 
will already be in place externally (?). So I proposed 
https://issues.apache.org/jira/browse/HBASE-7958?focusedCommentId=13997314&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13997314
 not surfacing stats calculated when processing HFiles into a system table but 
instead keep them as internal metadata that HFile/HStore could get at. The 
proposal was "maintain a tree of statistic files in HDFS" but this information 
could be embedded in HFiles themselves. The information there could also be 
exported to the metrics subsystem. Should we revive that issue? Although per 
block HFile statistics is something new I think. 

> Version stats in HFiles?
> ------------------------
>
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> In HBASE-9778 I basically punted the decision on whether doing repeated 
> scanner.next() called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of 
> versions we've seen for any row/col combination and store these in the 
> HFile's metadata (just like the timerange, oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions 
> (i.e. seek between columns is better) or not (in which case we'd issue 
> repeated next()'s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to