[
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482795#comment-15482795
]
binlijin commented on HBASE-16594:
----------------------------------
I get a part of column family a' data, and test it with ROW_INDEX_V2.
First the detail info is:
{code}
number of rows : 456399
avgKeyLen=56
avgValueLen=11
entries=69742427
length=5609482650
avg cells per row : 69742427/456399=152.8
avg row size: (56+11) * 152.8=10237.6(10k)
COMPRESSION => 'NONE'
BlockSize=8k DATA_BLOCK_ENCODING => 'NONE’ 5671843807
BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5683168196
BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3354641599
BlockSize=16k DATA_BLOCK_ENCODING => 'NONE’ 5636883803
BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5643473654
BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3306460265
BlockSize=32k DATA_BLOCK_ENCODING => 'NONE’ 5618631549
BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5622842708
BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3284154231
BlockSize=64k DATA_BLOCK_ENCODING => 'NONE’ 5609482650(5.22GB)
BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5612502105(5.23GB)
BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3273791654(3.05GB) -41.6%
COMPRESSION => 'LZO'
BlockSize=8k DATA_BLOCK_ENCODING => 'NONE’ 1.13GB
BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 1.13GB
BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 997MB
BlockSize=16k DATA_BLOCK_ENCODING => 'NONE’ 1.03GB
BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 1.03GB
BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 884MB
BlockSize=32k DATA_BLOCK_ENCODING => 'NONE’ 981MB
BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 983MB
BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 800MB
BlockSize=64k DATA_BLOCK_ENCODING => 'NONE’ 970MB
BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 971MB
BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 744MB -23.3%
{code}
> ROW_INDEX_V2 DBE
> ----------------
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
> Issue Type: Sub-task
> Components: Performance
> Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization,
> ROW_INDEX_V2 do storage optimization: store every row only once, store column
> family only once in a HFileBlock.
> ROW_INDEX_V1 is :
> /**
> * Store cells following every row's start offset, so we can binary search to
> a row's cells.
> *
> * Format:
> * flat cells
> * integer: number of rows
> * integer: row0's offset
> * integer: row1's offset
> * ....
> * integer: dataSize
> *
> */
> ROW_INDEX_V2 is :
> * row1 qualifier timestamp type value tag
> * qualifier timestamp type value tag
> * qualifier timestamp type value tag
> * row2 qualifier timestamp type value tag
> * row3 qualifier timestamp type value tag
> * qualifier timestamp type value tag
> * ....
> * integer: number of rows
> * integer: row0's offset
> * integer: row1's offset
> * ....
> * column family
> * integer: dataSize
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)