[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700542#action_12700542
]
Zheng Shao commented on HIVE-352:
---------------------------------
2 major approaches for the RCFileFormat to work are:
1. Lazy deserialization (and decompression): The Objects passed around in the
Hive Operators can be wrappers of handles to underlying decompression streams
which will decompress the data on the fly.
2. Column-hinting: Let Hive tell the FileFormat which columns are neede and
which are not.
There is a major benefit of Option 1 in a common case like this:
{code}
SELECT key, value1, value2, value3, value4 from columnarTable where key =
'xxyyzz';
{code}
if the selectivity of "key = 'xxyyzz'" is really high, we will end up
decompressing very few blocks of value1 to value4.
This is not possible with Option 2.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch,
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch,
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.