[ 
https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457181#comment-13457181
 ] 

stack commented on HBASE-6799:
------------------------------

Here is dump of hfile metadata from production:

{code}
Block index size as per heapsize: 110632
reader=/hbase/ad_campaign_monthly_stumbles/2081100778/default/77955d7c8845435dbcfe7b91a55fd1c4,
    compression=lzo,
    cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=f
    firstKey=100000:2009:06/default:organic/1264629982792/Put,
    lastKey=9:2006:03/default:paid/1260930865681/Put,
    avgKeyLen=38,
    avgValueLen=8,
    entries=1561501,
    length=19277379
Trailer:
    fileinfoOffset=19276842,
    loadOnOpenDataOffset=19248105,
    dataIndexCount=1313,
    metaIndexCount=0,
    totalUncomressedBytes=86064912,
    entryCount=1561501,
    compressionCodec=LZO,
    uncompressedDataIndexSize=67170,
    numDataIndexLevels=1,
    firstDataBlockOffset=0,
    lastDataBlockOffset=19247463,
    comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
    majorVersion=2,
    minorVersion=1
Fileinfo:
    DATA_BLOCK_ENCODING = NONE
    DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
    EARLIEST_PUT_TS = \x00\x00\x01%\x95\x02\xD9\xDA
    KEY_VALUE_VERSION = \x00\x00\x00\x01
    MAJOR_COMPACTION_KEY = \xFF
    MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
    MAX_SEQ_ID_KEY = 26057054872
    TIMERANGE = 1260925409754....1266607612712
    hfile.AVG_KEY_LEN = 38
    hfile.AVG_VALUE_LEN = 8
    hfile.LASTKEY = 
\x00\x099:2006:03\x07defaultpaid\x00\x00\x01%\x95V\x1A\x11\x04
Mid-key: \x00\x0D43195:2008:04\x07defaultpaid\x00\x00\x01%\x95W\x9C\x86\x04
Bloom filter:
    Not present
Delete Family Bloom filter:
    Not present
{code}

I'd have to look at the code but the above might be made of metadata and a 
toString on the Reader (Reader might seek the first key on open... and get last 
key from the hfile meta... which would not be the same as having all this data 
in the hfile meta).

Whether its major compacted is already in there... a bunch more could be added.
                
> Store more metadata in HFiles
> -----------------------------
>
>                 Key: HBASE-6799
>                 URL: https://issues.apache.org/jira/browse/HBASE-6799
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> Current we store metadata in HFile:
> * the timerange of KVs
> * the earliest PUT ts
> * max sequence id
> * whether or not this file was created from a major compaction.
> I would like to brainstorm what extra data we need to store to make an HFile 
> self describing. I.e. it could be backed up to somewhere with external tools 
> (without invoking an HBase server) can gleam enough information from it to 
> make use of the data inside. Ideally it would also be nice to be able to 
> recreate .META. from a bunch of HFiles to standup a temporary HBase instance 
> to process a bunch of HFiles.
> What I can think of:
> * min/max key
> * table
> * column family (or families to be future proof)
> * custom tags (set by a backup tools for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to