This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 3ce335d  [SPARK-33267][SQL] Fix NPE issue on 'In' filter when one of 
values contains null
3ce335d is described below

commit 3ce335d98cb3ce9a4c5a961d0f021fc4782b0f0c
Author: Jungtaek Lim (HeartSaVioR) <[email protected]>
AuthorDate: Wed Oct 28 10:00:29 2020 -0700

    [SPARK-33267][SQL] Fix NPE issue on 'In' filter when one of values contains 
null
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to fix the NPE issue on `In` filter when one of values 
contain null. In real case, you can trigger this issue when you try to push 
down the filter with `in (..., null)` against V2 source table. 
`DataSourceStrategy` caches the mapping (filter instance -> expression) in 
HashMap, which leverages hash code on the key, hence it could trigger the NPE 
issue.
    
    ### Why are the changes needed?
    
    This is an obvious bug as `In` filter doesn't care about null value when 
calculating hash code.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, previously the query with having `null` in "in" condition against data 
source V2 source table supporting push down filter failed with NPE, whereas 
after the PR the query will not fail.
    
    ### How was this patch tested?
    
    UT added. The new UT fails without the PR and passes with the PR.
    
    Closes #30170 from HeartSaVioR/SPARK-33267.
    
    Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit a744fea3be12f1a53ab553040b95da730210bc88)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../src/main/scala/org/apache/spark/sql/sources/filters.scala  |  2 +-
 .../org/apache/spark/sql/connector/DataSourceV2Suite.scala     | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
index 7533793..2b44a3a 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
@@ -164,7 +164,7 @@ case class In(attribute: String, values: Array[Any]) 
extends Filter {
     var h = attribute.hashCode
     values.foreach { v =>
       h *= 41
-      h += v.hashCode()
+      h += (if (v != null) v.hashCode() else 0)
     }
     h
   }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
index a9c521e..c2edcce 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala
@@ -413,6 +413,16 @@ class DataSourceV2Suite extends QueryTest with 
SharedSparkSession with AdaptiveS
       }
     }
   }
+
+  test("SPARK-33267: push down with condition 'in (..., null)' should not 
throw NPE") {
+    Seq(classOf[AdvancedDataSourceV2], 
classOf[JavaAdvancedDataSourceV2]).foreach { cls =>
+      withClue(cls.getName) {
+        val df = spark.read.format(cls.getName).load()
+        // before SPARK-33267 below query just threw NPE
+        df.select('i).where("i in (1, null)").collect()
+      }
+    }
+  }
 }
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to