[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700542#action_12700542
 ] 

Zheng Shao commented on HIVE-352:
---------------------------------

2 major approaches for the RCFileFormat to work are:
1. Lazy deserialization (and decompression): The Objects passed around in the 
Hive Operators can be wrappers of handles to underlying decompression streams 
which will decompress the data on the fly.
2. Column-hinting: Let Hive tell the FileFormat which columns are neede and 
which are not.

There is a major benefit of Option 1 in a common case like this:
{code}
SELECT key, value1, value2, value3, value4 from columnarTable where key = 
'xxyyzz';
{code}
if the selectivity of "key = 'xxyyzz'" is really high, we will end up 
decompressing very few blocks of value1 to value4.
This is not possible with Option 2.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to