[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700116#action_12700116
 ] 

Zheng Shao commented on HIVE-352:
---------------------------------

hive-352-2009-4-17.patch: 

Talked with Yongqiang offline. Two more things:

1. RCFile.readFields is not very efficient (see below). I think we should 
lazily decompress the stream instead of decompress all of it and return the 
decompressor. The reason is that decompressed data can be very big and easily 
go out-of-memory (if we consider 1:10 or more compression ratio)
{code}
           while (deflatFilter.available() > 0)
              valBuf.write(valueIn, 1);
{code}

2. Also we need to think about how we can pass the information of which columns 
are needed to Hive. Yongqiang is working on designing that. If anybody have 
good ideas, please chime in.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>         Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to