[
https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457181#comment-13457181
]
stack commented on HBASE-6799:
------------------------------
Here is dump of hfile metadata from production:
{code}
Block index size as per heapsize: 110632
reader=/hbase/ad_campaign_monthly_stumbles/2081100778/default/77955d7c8845435dbcfe7b91a55fd1c4,
compression=lzo,
cacheConf=CacheConfig:enabled [cacheDataOnRead=true]
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false]
[cacheEvictOnClose=false] [cacheCompressed=f
firstKey=100000:2009:06/default:organic/1264629982792/Put,
lastKey=9:2006:03/default:paid/1260930865681/Put,
avgKeyLen=38,
avgValueLen=8,
entries=1561501,
length=19277379
Trailer:
fileinfoOffset=19276842,
loadOnOpenDataOffset=19248105,
dataIndexCount=1313,
metaIndexCount=0,
totalUncomressedBytes=86064912,
entryCount=1561501,
compressionCodec=LZO,
uncompressedDataIndexSize=67170,
numDataIndexLevels=1,
firstDataBlockOffset=0,
lastDataBlockOffset=19247463,
comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
majorVersion=2,
minorVersion=1
Fileinfo:
DATA_BLOCK_ENCODING = NONE
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01%\x95\x02\xD9\xDA
KEY_VALUE_VERSION = \x00\x00\x00\x01
MAJOR_COMPACTION_KEY = \xFF
MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
MAX_SEQ_ID_KEY = 26057054872
TIMERANGE = 1260925409754....1266607612712
hfile.AVG_KEY_LEN = 38
hfile.AVG_VALUE_LEN = 8
hfile.LASTKEY =
\x00\x099:2006:03\x07defaultpaid\x00\x00\x01%\x95V\x1A\x11\x04
Mid-key: \x00\x0D43195:2008:04\x07defaultpaid\x00\x00\x01%\x95W\x9C\x86\x04
Bloom filter:
Not present
Delete Family Bloom filter:
Not present
{code}
I'd have to look at the code but the above might be made of metadata and a
toString on the Reader (Reader might seek the first key on open... and get last
key from the hfile meta... which would not be the same as having all this data
in the hfile meta).
Whether its major compacted is already in there... a bunch more could be added.
> Store more metadata in HFiles
> -----------------------------
>
> Key: HBASE-6799
> URL: https://issues.apache.org/jira/browse/HBASE-6799
> Project: HBase
> Issue Type: Brainstorming
> Reporter: Lars Hofhansl
>
> Current we store metadata in HFile:
> * the timerange of KVs
> * the earliest PUT ts
> * max sequence id
> * whether or not this file was created from a major compaction.
> I would like to brainstorm what extra data we need to store to make an HFile
> self describing. I.e. it could be backed up to somewhere with external tools
> (without invoking an HBase server) can gleam enough information from it to
> make use of the data inside. Ideally it would also be nice to be able to
> recreate .META. from a bunch of HFiles to standup a temporary HBase instance
> to process a bunch of HFiles.
> What I can think of:
> * min/max key
> * table
> * column family (or families to be future proof)
> * custom tags (set by a backup tools for example)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira