[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700098#action_12700098
 ] 

Zheng Shao commented on HIVE-352:
---------------------------------

hive-352-2009-4-17.patch:

Very nice job!

2 more tests to add:
1. Big data test. Take a look at 
ql/src/test/queries/clientpositive/groupby_bigdata.q to see how we generate big 
data sets.
2. Complex column types: Take a look at 
./ql/src/test/queries/clientpositive/input_lazyserde.q

Some other improvements:
1. ObjectInspectorFactory.getColumnarStructObjectInspector: I think you don't 
need byte separator and boolean lastColumnTakesRest. Just remove them.
2. ColumnarStruct.init: Can you cache/reuse the ByteArrayRef() instead of doing 
ByteArrayRef br = new ByteArrayRef() every time? The assumption in Hive is that 
data is already owned by creator, and whoever wants to keep the data for later 
use needs to get a deep copy of the Object by calling 
ObjectInspectorUtils.copyToStandardObject.
3. ColumnarStruct: comments should mention the difference against LazyStruct is 
that it reads data through init(BytesRefArrayWritable cols).
4. Can you put all changes to serde2.lazy package into a new package called 
serde2.columnar?
5. It seems there are a lot of shared code between LazySimpleSerDe and 
ColumnarSerDe, e.g. a lot of functionalities in init and serialize. Can you 
refactor LazySimpleSerde and put those common functionalities into public 
static methods, so that ColumnarSerDe can directly call? You might also want to 
put the configurations of the LazySimpleSerDe (nullString, separators, etc) 
into a public static Class, so that the public static methods will return it.


> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>         Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to