This is an automated email from the ASF dual-hosted git repository. dbtsai pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new d3e3df1 [SPARK-36644][SQL] Push down boolean column filter d3e3df1 is described below commit d3e3df17aac577e163cab7e085624f94f07c748e Author: Kazuyuki Tanimura <ktanim...@apple.com> AuthorDate: Fri Sep 3 07:39:14 2021 +0000 [SPARK-36644][SQL] Push down boolean column filter ### What changes were proposed in this pull request? This PR proposes to improve `DataSourceStrategy` to be able to push down boolean column filters. Currently boolean column filters do not get pushed down and may cause unnecessary IO. ### Why are the changes needed? The following query does not push down the filter in the current implementation ``` SELECT * FROM t WHERE boolean_field ``` although the following query pushes down the filter as expected. ``` SELECT * FROM t WHERE boolean_field = true ``` This is because the Physical Planner (`DataSourceStrategy`) currently only pushes down limited expression patterns like`EqualTo`. It is fair for Spark SQL users to expect `boolean_field` performs the same as `boolean_field = true`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests ``` build/sbt "core/testOnly *DataSourceStrategySuite -- -z SPARK-36644" ``` Closes #33898 from kazuyukitanimura/SPARK-36644. Authored-by: Kazuyuki Tanimura <ktanim...@apple.com> Signed-off-by: DB Tsai <d_t...@apple.com> --- .../apache/spark/sql/execution/datasources/DataSourceStrategy.scala | 3 +++ .../spark/sql/execution/datasources/DataSourceStrategySuite.scala | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index 7a5c343..30818b1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -552,6 +552,9 @@ object DataSourceStrategy case expressions.Literal(false, BooleanType) => Some(sources.AlwaysFalse) + case e @ pushableColumn(name) if e.dataType.isInstanceOf[BooleanType] => + Some(sources.EqualTo(name, true)) + case _ => None } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala index b94918e..37fe3c2 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala @@ -311,6 +311,10 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession { assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None) } + test("SPARK-36644: Push down boolean column filter") { + testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true))) + } + /** * Translate the given Catalyst [[Expression]] into data source [[sources.Filter]] * then verify against the given [[sources.Filter]]. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org