[spark] branch branch-3.2 updated: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

dongjoon Mon, 09 Aug 2021 10:48:42 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new 10f7f6e  [SPARK-36454][SQL] Not push down partition filter to ORCScan 
for DSv2
10f7f6e is described below

commit 10f7f6e62b4b0b00a54b4b85848772cb5234183c
Author: Huaxin Gao <[email protected]>
AuthorDate: Mon Aug 9 10:47:03 2021 -0700

    [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2
    
    ### What changes were proposed in this pull request?
    not push down partition filter to `ORCScan` for DSv2
    
    ### Why are the changes needed?
    Seems to me that partition filter is only used for partition pruning and 
shouldn't be pushed down to `ORCScan`. We don't push down partition filter to 
ORCScan in DSv1
    ```
    == Physical Plan ==
    *(1) Filter (isnotnull(value#19) AND NOT (value#19 = a))
    +- *(1) ColumnarToRow
       +- FileScan orc [value#19,p1#20,p2#21] Batched: true, DataFilters: 
[isnotnull(value#19), NOT (value#19 = a)], Format: ORC, Location: 
InMemoryFileIndex(1 
paths)[file:/private/var/folders/pt/_5f4sxy56x70dv9zpz032f0m0000gn/T/spark-c1...,
 PartitionFilters: [isnotnull(p1#20), isnotnull(p2#21), (p1#20 = 1), (p2#21 = 
2)], PushedFilters: [IsNotNull(value), Not(EqualTo(value,a))], ReadSchema: 
struct<value:string>
    ```
    Also, we don't push down partition filter for parquet in DSv2.
    https://github.com/apache/spark/pull/30652
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Existing test suites
    
    Closes #33680 from huaxingao/orc_filter.
    
    Authored-by: Huaxin Gao <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit b04330cd38e2817748ff50a7bf63b7145ea85cd4)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala | 3 ++-
 sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala        | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
index a8c813a..dc59526 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
@@ -53,7 +53,8 @@ case class OrcScanBuilder(
 
   override def pushFilters(filters: Array[Filter]): Array[Filter] = {
     if (sparkSession.sessionState.conf.orcFilterPushDown) {
-      val dataTypeMap = OrcFilters.getSearchableTypeMap(schema, 
SQLConf.get.caseSensitiveAnalysis)
+      val dataTypeMap = OrcFilters.getSearchableTypeMap(
+        readDataSchema(), SQLConf.get.caseSensitiveAnalysis)
       _pushedFilters = OrcFilters.convertibleFilters(dataTypeMap, 
filters).toArray
     }
     filters
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
index 2086f97..52daa3d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
@@ -460,7 +460,7 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
           "parquet" ->
             "|PushedFilters: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
           "orc" ->
-            "|PushedFilters: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+            "|PushedFilters: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
           "csv" ->
             "|PushedFilters: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
           "json" ->

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.2 updated: [SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2

Reply via email to