[ 
https://issues.apache.org/jira/browse/HIVE-21632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora resolved HIVE-21632.
----------------------------------
    Resolution: Duplicate

> Hive should not push partition columns to the Parquet predicate, even if the 
> data file contains the partition column
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21632
>                 URL: https://issues.apache.org/jira/browse/HIVE-21632
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Priority: Minor
>
> If there is a partitioned Parquet table in Hive, and the data file in one of 
> the partitions (not correctly) contains the partition column as well, 
> filtering on the partition column will return no rows if the Parquet 
> predicate pushdown is enabled. If the PPD is disabled, the rows will return 
> correctly.
> The reason why it doesn't work is that, if the PPD is switched on, Hive will 
> send the predicate 'partition_column= ...' to parquet and a requested schema 
> which doesn't contain the partition column. When the data is read from 
> parquet, this column will be skipped, because the requested schema doesn't 
> contain it, but it still tries to apply the filter predicate, so it will 
> return an empty result set.
> I think if the rows are returned correctly without PPD, they should be 
> returned with PPD as well. Hive should omit the partition column from the 
> Parquet predicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to