[ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380194#comment-15380194
 ] 

Xuefu Zhang commented on HIVE-13873:
------------------------------------

Thanks, [~Ferd]. Regarding #3, for array, there is probably nothing to do about 
it. Map is probably encoded as an array of struct of key and value, so there 
might be nothing to do there either (Hive has no way to get all keys or values 
in a map). Thus, we are probably good on that.

While you're doing this work, it would be great to check if this has any 
performance gain. The similar work done for Presto sees a few times faster in 
highly selective projections.

> Column pruning for nested fields
> --------------------------------
>
>                 Key: HIVE-13873
>                 URL: https://issues.apache.org/jira/browse/HIVE-13873
>             Project: Hive
>          Issue Type: New Feature
>          Components: Logical Optimizer
>            Reporter: Xuefu Zhang
>            Assignee: Ferdinand Xu
>         Attachments: HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to