[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700098#action_12700098
]
Zheng Shao commented on HIVE-352:
---------------------------------
hive-352-2009-4-17.patch:
Very nice job!
2 more tests to add:
1. Big data test. Take a look at
ql/src/test/queries/clientpositive/groupby_bigdata.q to see how we generate big
data sets.
2. Complex column types: Take a look at
./ql/src/test/queries/clientpositive/input_lazyserde.q
Some other improvements:
1. ObjectInspectorFactory.getColumnarStructObjectInspector: I think you don't
need byte separator and boolean lastColumnTakesRest. Just remove them.
2. ColumnarStruct.init: Can you cache/reuse the ByteArrayRef() instead of doing
ByteArrayRef br = new ByteArrayRef() every time? The assumption in Hive is that
data is already owned by creator, and whoever wants to keep the data for later
use needs to get a deep copy of the Object by calling
ObjectInspectorUtils.copyToStandardObject.
3. ColumnarStruct: comments should mention the difference against LazyStruct is
that it reads data through init(BytesRefArrayWritable cols).
4. Can you put all changes to serde2.lazy package into a new package called
serde2.columnar?
5. It seems there are a lot of shared code between LazySimpleSerDe and
ColumnarSerDe, e.g. a lot of functionalities in init and serialize. Can you
refactor LazySimpleSerde and put those common functionalities into public
static methods, so that ColumnarSerDe can directly call? You might also want to
put the configurations of the LazySimpleSerDe (nullString, separators, etc)
into a public static Class, so that the public static methods will return it.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
> Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch,
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch,
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.