[ 
https://issues.apache.org/jira/browse/HIVE-21599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633876#comment-17633876
 ] 

Stamatis Zampetakis commented on HIVE-21599:
--------------------------------------------

I think the problem/bug is a bit more general than partition columns appearing 
inside the Parquet schema.

Pushing a predicate in a Parquet file is not a problem per se even if it 
includes partitioning columns. What is problematic is applying a predicate on 
missing/pruned columns. This can lead to wrong results since missing columns 
are populated with null values.

Since the predicates are applied after column pruning we should ensure that we 
don't create predicates for columns that are removed/pruned. One way to achieve 
this is to use the same schema for predicate push-down and column pruning 
optimizations.

In Parquet the column pruning information is captured by 
\{{ReadContext#getRequestedSchema}} method so it suffices to use this method 
consistently.

> Parquet predicate pushdown on partition columns may cause wrong result if 
> files contain partition columns
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21599
>                 URL: https://issues.apache.org/jira/browse/HIVE-21599
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Vineet Garg
>            Assignee: Soumyakanti Das
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21599.1.patch
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Filter predicates are pushed to Table Scan (to be pushed to and used by 
> storage handler/input format). Such predicates could consist of partition 
> columns which are of no use to storage handler  or input formats. Therefore 
> it should be removed from TS filter expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to