This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.2 by this push:
new 10f7f6e [SPARK-36454][SQL] Not push down partition filter to ORCScan
for DSv2
10f7f6e is described below
commit 10f7f6e62b4b0b00a54b4b85848772cb5234183c
Author: Huaxin Gao <[email protected]>
AuthorDate: Mon Aug 9 10:47:03 2021 -0700
[SPARK-36454][SQL] Not push down partition filter to ORCScan for DSv2
### What changes were proposed in this pull request?
not push down partition filter to `ORCScan` for DSv2
### Why are the changes needed?
Seems to me that partition filter is only used for partition pruning and
shouldn't be pushed down to `ORCScan`. We don't push down partition filter to
ORCScan in DSv1
```
== Physical Plan ==
*(1) Filter (isnotnull(value#19) AND NOT (value#19 = a))
+- *(1) ColumnarToRow
+- FileScan orc [value#19,p1#20,p2#21] Batched: true, DataFilters:
[isnotnull(value#19), NOT (value#19 = a)], Format: ORC, Location:
InMemoryFileIndex(1
paths)[file:/private/var/folders/pt/_5f4sxy56x70dv9zpz032f0m0000gn/T/spark-c1...,
PartitionFilters: [isnotnull(p1#20), isnotnull(p2#21), (p1#20 = 1), (p2#21 =
2)], PushedFilters: [IsNotNull(value), Not(EqualTo(value,a))], ReadSchema:
struct<value:string>
```
Also, we don't push down partition filter for parquet in DSv2.
https://github.com/apache/spark/pull/30652
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing test suites
Closes #33680 from huaxingao/orc_filter.
Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit b04330cd38e2817748ff50a7bf63b7145ea85cd4)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala | 3 ++-
sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
index a8c813a..dc59526 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
@@ -53,7 +53,8 @@ case class OrcScanBuilder(
override def pushFilters(filters: Array[Filter]): Array[Filter] = {
if (sparkSession.sessionState.conf.orcFilterPushDown) {
- val dataTypeMap = OrcFilters.getSearchableTypeMap(schema,
SQLConf.get.caseSensitiveAnalysis)
+ val dataTypeMap = OrcFilters.getSearchableTypeMap(
+ readDataSchema(), SQLConf.get.caseSensitiveAnalysis)
_pushedFilters = OrcFilters.convertibleFilters(dataTypeMap,
filters).toArray
}
filters
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
index 2086f97..52daa3d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
@@ -460,7 +460,7 @@ class ExplainSuite extends ExplainSuiteHelper with
DisableAdaptiveExecutionSuite
"parquet" ->
"|PushedFilters: \\[IsNotNull\\(value\\),
GreaterThan\\(value,2\\)\\]",
"orc" ->
- "|PushedFilters: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\),
.*\\(value,2\\)\\]",
+ "|PushedFilters: \\[IsNotNull\\(value\\),
GreaterThan\\(value,2\\)\\]",
"csv" ->
"|PushedFilters: \\[IsNotNull\\(value\\),
GreaterThan\\(value,2\\)\\]",
"json" ->
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]