[ 
https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459821#comment-13459821
 ] 

Andrew Purtell edited comment on HBASE-6799 at 9/21/12 5:21 AM:
----------------------------------------------------------------

A generic/custom tags facility would be great, then we can try out a number of 
things without requiring core patching.

I would like to see CF access statistics. Could do a snapshot of current CF 
metrics when the HFile is written. Then we would have a local memory of dynamic 
per-CF metrics, for such things as HBASE-6572. And compaction could perhaps 
merge such CF statistics snapshots in HFiles with time based exponential 
weighting. Further, we might differentiate between "online" measurements (<= 15 
minutes) and a longer historical view of per-CF metrics, and initialize the 
latter after region migration or cold boot from the most recent HFile.
                
      was (Author: apurtell):
    A generic/custom tags facility would be great, then we can try out a number 
of things without requiring core patching.

I would like to see CF access statistics. Could do a snapshot of current CF 
metrics when the HFile is written, as a first cut. Then dynamic per-CF metrics 
could be reinitialized after region migration or cold boot from the most recent 
HFile - a recent flush, presumably. Perhaps we might want to differentiate 
between "online" measurements (<= 15 minutes) and a longer historical view, and 
initialize only the latter. Anyway, then we have a local memory of the per-CF 
metrics, for such things as HBASE-6572.
                  
> Store more metadata in HFiles
> -----------------------------
>
>                 Key: HBASE-6799
>                 URL: https://issues.apache.org/jira/browse/HBASE-6799
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> Current we store metadata in HFile:
> * the timerange of KVs
> * the earliest PUT ts
> * max sequence id
> * whether or not this file was created from a major compaction.
> I would like to brainstorm what extra data we need to store to make an HFile 
> self describing. I.e. it could be backed up to somewhere with external tools 
> (without invoking an HBase server) can gleam enough information from it to 
> make use of the data inside. Ideally it would also be nice to be able to 
> recreate .META. from a bunch of HFiles to standup a temporary HBase instance 
> to process a bunch of HFiles.
> What I can think of:
> * min/max key
> * table
> * column family (or families to be future proof)
> * custom tags (set by a backup tools for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to