[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700116#action_12700116 ]
Zheng Shao commented on HIVE-352: --------------------------------- hive-352-2009-4-17.patch: Talked with Yongqiang offline. Two more things: 1. RCFile.readFields is not very efficient (see below). I think we should lazily decompress the stream instead of decompress all of it and return the decompressor. The reason is that decompressed data can be very big and easily go out-of-memory (if we consider 1:10 or more compression ratio) {code} while (deflatFilter.available() > 0) valBuf.write(valueIn, 1); {code} 2. Also we need to think about how we can pass the information of which columns are needed to Hive. Yongqiang is working on designing that. If anybody have good ideas, please chime in. > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.