[ 
https://issues.apache.org/jira/browse/SPARK-38041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483518#comment-17483518
 ] 

Hyukjin Kwon commented on SPARK-38041:
--------------------------------------

[~Jackey Lee] mind showing a self-contained reproducer please?

> DataFilter pushed down with PartitionFilter
> -------------------------------------------
>
>                 Key: SPARK-38041
>                 URL: https://issues.apache.org/jira/browse/SPARK-38041
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Jackey Lee
>            Priority: Major
>
> At present, the Filter is divided into DataFilter and PartitionFilter when it 
> is pushed down, but when the Filter removes the PartitionFilter, it means 
> that all Partitions will scan all DataFilter conditions, which may cause full 
> data scan.
> Here is a example.
> before
> {code:java}
> == Physical Plan ==
> *(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND 
> (c#2 < 3)))
> +- *(1) ColumnarToRow
>    +- FileScan parquet datasources.test_push_down[a#0,b#1,c#2] Batched: true, 
> DataFilters: [((a#0 < 10) OR (a#0 >= 10))], Format: Parquet, Location: 
> InMemoryFileIndex(0 paths)[], PartitionFilters: [((c#2 = 0) OR ((c#2 >= 1) 
> AND (c#2 < 3)))], PushedFilters: 
> [Or(LessThan(a,10),GreaterThanOrEqual(a,10))], ReadSchema: 
> struct<a:int,b:int> {code}
> after
> {code:java}
> == Physical Plan ==
> *(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND 
> (c#2 < 3)))
> +- *(1) ColumnarToRow
>    +- FileScan parquet datasources.test_push_down[a#0,b#1,c#2] Batched: true, 
> DataFilters: [(((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) 
> AND (c#2 < 3)))], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], 
> PartitionFilters: [((c#2 = 0) OR ((c#2 >= 1) AND (c#2 < 3)))], PushedFilters: 
> [Or(LessThan(a,10),GreaterThanOrEqual(a,10))], ReadSchema: 
> struct<a:int,b:int>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to