[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700098#action_12700098 ]
Zheng Shao commented on HIVE-352: --------------------------------- hive-352-2009-4-17.patch: Very nice job! 2 more tests to add: 1. Big data test. Take a look at ql/src/test/queries/clientpositive/groupby_bigdata.q to see how we generate big data sets. 2. Complex column types: Take a look at ./ql/src/test/queries/clientpositive/input_lazyserde.q Some other improvements: 1. ObjectInspectorFactory.getColumnarStructObjectInspector: I think you don't need byte separator and boolean lastColumnTakesRest. Just remove them. 2. ColumnarStruct.init: Can you cache/reuse the ByteArrayRef() instead of doing ByteArrayRef br = new ByteArrayRef() every time? The assumption in Hive is that data is already owned by creator, and whoever wants to keep the data for later use needs to get a deep copy of the Object by calling ObjectInspectorUtils.copyToStandardObject. 3. ColumnarStruct: comments should mention the difference against LazyStruct is that it reads data through init(BytesRefArrayWritable cols). 4. Can you put all changes to serde2.lazy package into a new package called serde2.columnar? 5. It seems there are a lot of shared code between LazySimpleSerDe and ColumnarSerDe, e.g. a lot of functionalities in init and serialize. Can you refactor LazySimpleSerde and put those common functionalities into public static methods, so that ColumnarSerDe can directly call? You might also want to put the configurations of the LazySimpleSerDe (nullString, separators, etc) into a public static Class, so that the public static methods will return it. > Make Hive support column based storage > -------------------------------------- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature > Reporter: He Yongqiang > Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.