[jira] [Created] (SPARK-38041) DataFilter pushed down with PartitionFilter

Jackey Lee (Jira) Wed, 26 Jan 2022 20:11:07 -0800

Jackey Lee created SPARK-38041:
----------------------------------

             Summary: DataFilter pushed down with PartitionFilter
                 Key: SPARK-38041
                 URL: https://issues.apache.org/jira/browse/SPARK-38041
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Jackey Lee



At present, the Filter is divided into DataFilter and PartitionFilter when it 
is pushed down, but when the Filter removes the PartitionFilter, it means that 
all Partitions will scan all DataFilter conditions, which may cause full data 
scan.

Here is a example.

before
{code:java}
== Physical Plan ==
*(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND 
(c#2 < 3)))
+- *(1) ColumnarToRow
   +- FileScan parquet datasources.test_push_down[a#0,b#1,c#2] Batched: true, 
DataFilters: [((a#0 < 10) OR (a#0 >= 10))], Format: Parquet, Location: 
InMemoryFileIndex(0 paths)[], PartitionFilters: [((c#2 = 0) OR ((c#2 >= 1) AND 
(c#2 < 3)))], PushedFilters: [Or(LessThan(a,10),GreaterThanOrEqual(a,10))], 
ReadSchema: struct<a:int,b:int> {code}
after
{code:java}
== Physical Plan == *(1) Filter (((a#0 < 10) AND (c#2 = 0)) OR (((a#0 >= 10) 
AND (c#2 >= 1)) AND (c#2 < 3))) +- *(1) ColumnarToRow    +- FileScan parquet 
datasources.test_push_down[a#0,b#1,c#2] Batched: true, DataFilters: [(((a#0 < 
10) AND (c#2 = 0)) OR (((a#0 >= 10) AND (c#2 >= 1)) AND (c#2 < 3)))], Format: 
Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [((c#2 = 0) 
OR ((c#2 >= 1) AND (c#2 < 3)))], PushedFilters: 
[Or(LessThan(a,10),GreaterThanOrEqual(a,10))], ReadSchema: struct<a:int,b:int>  
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-38041) DataFilter pushed down with PartitionFilter

Reply via email to