[
https://issues.apache.org/jira/browse/SPARK-27698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-27698:
-----------------------------------
Description:
To return accurate pushed filters in Parquet file
scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673),
we can process the original data source filters in the following way:
1. For "And" operators, split the conjunctive predicates and try converting
each of them. After that
1.1 if partially predicate pushed down is allowed, return convertible results;
1.2 otherwise, return the whole predicate if convertible, or empty result if
not convertible.
2. For other operators, they are not able to be partially pushed down.
2.1 if the entire predicate is convertible, return itself
2.2 otherwise, return an empty result.
This PR also contains code refactoring. Currently `ParquetFilters. createFilter
` accepts parameter `schema: MessageType` and create field mapping for every
input filter. We can make it a class member and avoid creating the
`nameToParquetField` mapping for every input filter.
was:
To return accurate pushed filters in Parquet file
scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673),
we can process the original data source filters in the following way:
1. For "And" operators, split the conjunctive predicates and try converting
each of them. After that
:
1.1 if partially predicate pushed down is allowed, return convertible results;
1.2 otherwise, return the whole predicate if convertible, or empty result if
not convertible.
2. For other operators, it is either entirely pushed down, or not pushed down.
In the current push down strategy, the "Non-And" operators are not able to be
partially pushed down.
> Add new method for getting pushed down filters in Parquet file reader
> ---------------------------------------------------------------------
>
> Key: SPARK-27698
> URL: https://issues.apache.org/jira/browse/SPARK-27698
> Project: Spark
> Issue Type: Task
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Gengliang Wang
> Priority: Major
>
> To return accurate pushed filters in Parquet file
> scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673),
> we can process the original data source filters in the following way:
> 1. For "And" operators, split the conjunctive predicates and try converting
> each of them. After that
> 1.1 if partially predicate pushed down is allowed, return convertible
> results;
> 1.2 otherwise, return the whole predicate if convertible, or empty result if
> not convertible.
> 2. For other operators, they are not able to be partially pushed down.
> 2.1 if the entire predicate is convertible, return itself
> 2.2 otherwise, return an empty result.
> This PR also contains code refactoring. Currently `ParquetFilters.
> createFilter ` accepts parameter `schema: MessageType` and create field
> mapping for every input filter. We can make it a class member and avoid
> creating the `nameToParquetField` mapping for every input filter.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]