This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new d3e3df1  [SPARK-36644][SQL] Push down boolean column filter
d3e3df1 is described below

commit d3e3df17aac577e163cab7e085624f94f07c748e
Author: Kazuyuki Tanimura <ktanim...@apple.com>
AuthorDate: Fri Sep 3 07:39:14 2021 +0000

    [SPARK-36644][SQL] Push down boolean column filter
    
    ### What changes were proposed in this pull request?
    This PR proposes to improve `DataSourceStrategy` to be able to push down 
boolean column filters. Currently boolean column filters do not get pushed down 
and may cause unnecessary IO.
    
    ### Why are the changes needed?
    The following query does not push down the filter in the current 
implementation
    ```
    SELECT * FROM t WHERE boolean_field
    ```
    although the following query pushes down the filter as expected.
    ```
    SELECT * FROM t WHERE boolean_field = true
    ```
    This is because the Physical Planner (`DataSourceStrategy`) currently only 
pushes down limited expression patterns like`EqualTo`.
    It is fair for Spark SQL users to expect `boolean_field` performs the same 
as `boolean_field = true`.
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Added unit tests
    ```
    build/sbt "core/testOnly *DataSourceStrategySuite   -- -z SPARK-36644"
    ```
    
    Closes #33898 from kazuyukitanimura/SPARK-36644.
    
    Authored-by: Kazuyuki Tanimura <ktanim...@apple.com>
    Signed-off-by: DB Tsai <d_t...@apple.com>
---
 .../apache/spark/sql/execution/datasources/DataSourceStrategy.scala   | 3 +++
 .../spark/sql/execution/datasources/DataSourceStrategySuite.scala     | 4 ++++
 2 files changed, 7 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
index 7a5c343..30818b1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
@@ -552,6 +552,9 @@ object DataSourceStrategy
     case expressions.Literal(false, BooleanType) =>
       Some(sources.AlwaysFalse)
 
+    case e @ pushableColumn(name) if e.dataType.isInstanceOf[BooleanType] =>
+      Some(sources.EqualTo(name, true))
+
     case _ => None
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
index b94918e..37fe3c2 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala
@@ -311,6 +311,10 @@ class DataSourceStrategySuite extends PlanTest with 
SharedSparkSession {
     assert(PushableColumnAndNestedColumn.unapply(Abs('col.int)) === None)
   }
 
+  test("SPARK-36644: Push down boolean column filter") {
+    testTranslateFilter('col.boolean, Some(sources.EqualTo("col", true)))
+  }
+
   /**
    * Translate the given Catalyst [[Expression]] into data source 
[[sources.Filter]]
    * then verify against the given [[sources.Filter]].

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to