[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205892#comment-13205892
 ] 

He Yongqiang commented on HBASE-5313:
-------------------------------------

@Todd, with such a small block size and data also already sorted, i was also 
thinking it is will be very hard to optimize the space.

So we did some experiments by modifying today's HFileWriter. It turns out it 
can still save a lot if we play more tricks.

Here are test results (block size is 16KB):

*42MB HFile, with Delta compression and with LZO compression* (with default 
setting on Apache trunk)

*30MB HFile, with Columnar, with Delta compression, and with LZO compression.*

Inside one block, first put all row keys inside that block, and do delta 
compression, and then LZO compression. After row key, put all column family 
data in that block, and do Delta+LZO for it. And then similarly put 
column_qualifier. etc

*24MB HFile, with Columnar, Sort value column, Sort column_qualifier column, 
and with LZO compression.*

Inside one block, first put all row keys inside that block, and do delta 
compression, and then LZO compression. After row key, put all column family 
data in that block, and do Delta+LZO for it. And then put column_qualifier, 
sort it, and then do Delta+LZO. TS column and Code column are processed the 
same as column family. The value column is processed the same as 
column_qualifier. So it is the same as disk format for the 30MB HFile, except 
all data for 'column_qualifier' and 'value' are sorted separately.

Out of 24MB file, 6MB is used to store row keys, 7MB is used to store 
column_qualifier, and 6MB is to store value.

More ideas are welcome! 

                
> Restructure hfiles layout for better compression
> ------------------------------------------------
>
>                 Key: HBASE-5313
>                 URL: https://issues.apache.org/jira/browse/HBASE-5313
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to