[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203918#comment-13203918
 ] 

Prakash Khemani commented on HBASE-5313:
----------------------------------------

The values can be kept compressed in memory. We can uncompress them on
demand when writing out the key-values during rpc or compactions.

The key has to have a pointer to the values. The pointer can be implicit
and can be derived from value lengths if all the values are stored in the
same order as keys.

The value pointer has to be explicit if the values are stored in a
different order than the keys. We might want to write out the values in a
different order if we want to do per column compression. While writing out
the HFileBlock the following can be done - group all the values by their
column identifier, independently compress and write out each group of
values, go back to the keys and update the value pointers.


On 2/8/12 11:50 AM, "Lars Hofhansl (Commented) (JIRA)" <[email protected]>



                
> Restructure hfiles layout for better compression
> ------------------------------------------------
>
>                 Key: HBASE-5313
>                 URL: https://issues.apache.org/jira/browse/HBASE-5313
>             Project: HBase
>          Issue Type: Improvement
>          Components: io
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs 
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the 
> block (let's call this the key-section) and then store all their 
> corresponding values towards the end of the block. This will allow us to 
> not-even decompress the values when we are scanning and skipping over rows in 
> the block.
> Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to