Ryan Blue created PARQUET-1510:
----------------------------------

             Summary: Dictionary filter skips null values when evaluating 
not-equals.
                 Key: PARQUET-1510
                 URL: https://issues.apache.org/jira/browse/PARQUET-1510
             Project: Parquet
          Issue Type: Improvement
            Reporter: Ryan Blue


This was discovered in Spark, see SPARK-26677. From the Spark PR:

{code}
// Repeat the values to get dictionary encoding.
Seq(Some("A"), Some("A"), 
None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
+-----+
{code}

{code}
// Use plain encoding.
Seq(Some("A"), 
None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
+-----+
|value|
+-----+
| null|
+-----+
{code}

This is a correctness issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to