[
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203324#comment-13203324
]
He Yongqiang commented on HBASE-5313:
-------------------------------------
As discussed earlier, one thing we can try is to use something like hive's
rcfile. The thing different from hive is hbase row's value is not a single
type. If it turns out the columnar file format helps, we can employ nested
columnar format for the value (like what dremel does.). There is one thread on
Quora about dremel
http://www.quora.com/How-will-Googles-Dremel-change-future-Hadoop-releases.
> Restructure hfiles layout for better compression
> ------------------------------------------------
>
> Key: HBASE-5313
> URL: https://issues.apache.org/jira/browse/HBASE-5313
> Project: HBase
> Issue Type: Improvement
> Components: io
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
>
> A HFile block contain a stream of key-values. Can we can organize these kvs
> on the disk in a better way so that we get much greater compression ratios?
> One option (thanks Prakash) is to store all the keys in the beginning of the
> block (let's call this the key-section) and then store all their
> corresponding values towards the end of the block. This will allow us to
> not-even decompress the values when we are scanning and skipping over rows in
> the block.
> Any other ideas?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira