[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689796#action_12689796
]
He Yongqiang commented on HIVE-352:
-----------------------------------
Also the cost of tuple reconstruction accounts for a large proportion of the
whole execution time. In our initial exprements, the reconstruction cost is
much higher than the benefit of intergreting the column-execution and the
underlying column-storage. The reconstruction is a Map-Reduce join operation.
The cost can be extremely reduced in some queries when we can reduce the number
of tuples needed to reconstruct. The key to this is a late materialization.
But in the current B2.2, the localize rows in a single file and adopt a
record-level columnar storage, it does not have the tuple reconstruction cost.
But it needs a more specific and more flexble compression algorithms, and i
strongly recommed to support bitmap file in future. As the main benefit of a
columnar strategy, it needs us to add some columnar operators in the next.
But now let us make the first step, and then add more optimizations.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.