Hyukjin Kwon created SPARK-24934: ------------------------------------ Summary: Should handle missing upper/lower bounds cases in in-memory partition pruning Key: SPARK-24934 URL: https://issues.apache.org/jira/browse/SPARK-24934 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Hyukjin Kwon
For example, if array is used (where the lower and upper bounds for its column batch are {{null}})), it looks wrongly filtering all data out: {code} scala> import org.apache.spark.sql.functions import org.apache.spark.sql.functions scala> val df = Seq(Array("a", "b"), Array("c", "d")).toDF("arrayCol") df: org.apache.spark.sql.DataFrame = [arrayCol: array<string>] scala> df.filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show() +--------+ |arrayCol| +--------+ | [a, b]| +--------+ scala> df.cache().filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show() +--------+ |arrayCol| +--------+ +--------+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org