Ryan Blue created PARQUET-1510:
----------------------------------
Summary: Dictionary filter skips null values when evaluating
not-equals.
Key: PARQUET-1510
URL: https://issues.apache.org/jira/browse/PARQUET-1510
Project: Parquet
Issue Type: Improvement
Reporter: Ryan Blue
This was discovered in Spark, see SPARK-26677. From the Spark PR:
{code}
// Repeat the values to get dictionary encoding.
Seq(Some("A"), Some("A"),
None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
+-----+
{code}
{code}
// Use plain encoding.
Seq(Some("A"),
None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
| null|
+-----+
{code}
This is a correctness issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)