[
https://issues.apache.org/jira/browse/SPARK-38041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483518#comment-17483518
]
Hyukjin Kwon commented on SPARK-38041:
--------------------------------------
[~Jackey Lee] mind showing a self-contained reproducer please?
> DataFilter pushed down with PartitionFilter
> -------------------------------------------
>
> Key: SPARK-38041
> URL: https://issues.apache.org/jira/browse/SPARK-38041
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Jackey Lee
> Priority: Major
>
> At present, the Filter is divided into DataFilter and PartitionFilter when it
> is pushed down, but when the Filter removes the PartitionFilter, it means
> that all Partitions will scan all DataFilter conditions, which may cause full
> data scan.
> Here is a example.
> before
> {code:java}
> == Physical Plan ==
> *(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND
> (c#2 < 3)))
> +- *(1) ColumnarToRow
> +- FileScan parquet datasources.test_push_down[a#0,b#1,c#2] Batched: true,
> DataFilters: [((a#0 < 10) OR (a#0 >= 10))], Format: Parquet, Location:
> InMemoryFileIndex(0 paths)[], PartitionFilters: [((c#2 = 0) OR ((c#2 >= 1)
> AND (c#2 < 3)))], PushedFilters:
> [Or(LessThan(a,10),GreaterThanOrEqual(a,10))], ReadSchema:
> struct<a:int,b:int> {code}
> after
> {code:java}
> == Physical Plan ==
> *(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND
> (c#2 < 3)))
> +- *(1) ColumnarToRow
> +- FileScan parquet datasources.test_push_down[a#0,b#1,c#2] Batched: true,
> DataFilters: [(((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1))
> AND (c#2 < 3)))], Format: Parquet, Location: InMemoryFileIndex(0 paths)[],
> PartitionFilters: [((c#2 = 0) OR ((c#2 >= 1) AND (c#2 < 3)))], PushedFilters:
> [Or(LessThan(a,10),GreaterThanOrEqual(a,10))], ReadSchema:
> struct<a:int,b:int> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]