[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701074#action_12701074
 ] 

Zheng Shao commented on HIVE-352:
---------------------------------

Yongqiang talked with me offline since 1 conflicts with Joydeep's earlier 
comment: "We have seen that having a number of open codecs can hurt in memory 
usage - that's one open question for me - can we actually afford to open N 
concurrent compressed streams (assuming each column is stored compressed 
separately)."

I think we should do an experiment to see how much memory each concurrent 
compressing stream takes - in reality, most tables will have less than 100 
columns, so I guess if each codec takes 100K-500K memory it's affordable (total 
memory usage: 10MB-50MB), otherwise we need to rethink about 1. I 


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, 
> HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to