[
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374184#comment-15374184
]
Xuefu Zhang commented on HIVE-13873:
------------------------------------
[~Ferd], thanks for working on this. Patch looks good for the initial cut as I
went through the patch. Here I have a couple of immature thoughts to share with
you:
1. nested column pruning should goes beyond just select op or groupby op. For
instance,
{code}
select msg.a from t where msg.b = 'x';
{code}
In this case, parquet reader should only read a and b from msg field. Thus, I
think we need to consider expressions from more operators.
2. Secondly, there may need a consolidation/merging process in determining
finally read schema. For instance,
{code}
select msg from t where msg.a='x';
{code}
In this case, the projected column should be just msg rather than msg + msg.a.
3. While it's fine to support just struct at first, we may need to consider how
to find a more extensible way to pass the projected fields to the reader to
support other types (array and map). I have no idea on this, so love to hear
your thoughts.
> Column pruning for nested fields
> --------------------------------
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
> Issue Type: New Feature
> Components: Logical Optimizer
> Reporter: Xuefu Zhang
> Assignee: Ferdinand Xu
> Attachments: HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also
> column by column using encoding described in Google Dramel pager. It's very
> common in big data where data are stored in structs while queries only needs
> a subset of the the fields in the structs. However, presently Hive still
> needs to read the whole struct regardless whether all fields are selected.
> Therefore, pruning unwanted sub-fields in struct or nested fields at file
> reading time would be a big performance boost for such scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)