[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701074#action_12701074 ]
Zheng Shao commented on HIVE-352: --------------------------------- Yongqiang talked with me offline since 1 conflicts with Joydeep's earlier comment: "We have seen that having a number of open codecs can hurt in memory usage - that's one open question for me - can we actually afford to open N concurrent compressed streams (assuming each column is stored compressed separately)." I think we should do an experiment to see how much memory each concurrent compressing stream takes - in reality, most tables will have less than 100 columns, so I guess if each codec takes 100K-500K memory it's affordable (total memory usage: 10MB-50MB), otherwise we need to rethink about 1. I > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, > HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.