[
https://issues.apache.org/jira/browse/HIVE-21632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822110#comment-16822110
]
Vineet Garg commented on HIVE-21632:
------------------------------------
Duplicate of HIVE-21599?
> Hive should not push partition columns to the Parquet predicate, even if the
> data file contains the partition column
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-21632
> URL: https://issues.apache.org/jira/browse/HIVE-21632
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Marta Kuczora
> Priority: Minor
>
> If there is a partitioned Parquet table in Hive, and the data file in one of
> the partitions (not correctly) contains the partition column as well,
> filtering on the partition column will return no rows if the Parquet
> predicate pushdown is enabled. If the PPD is disabled, the rows will return
> correctly.
> The reason why it doesn't work is that, if the PPD is switched on, Hive will
> send the predicate 'partition_column= ...' to parquet and a requested schema
> which doesn't contain the partition column. When the data is read from
> parquet, this column will be skipped, because the requested schema doesn't
> contain it, but it still tries to apply the filter predicate, so it will
> return an empty result set.
> I think if the rows are returned correctly without PPD, they should be
> returned with PPD as well. Hive should omit the partition column from the
> Parquet predicate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)