Hi everyone,

The Spark community caught a correctness bug in Parquet, PARQUET-1510
<https://issues.apache.org/jira/browse/PARQUET-1510> and SPARK-26677
<https://issues.apache.org/jira/browse/SPARK-26677>. The dictionary filter
was ignoring null values and skipping row groups incorrectly.

Spark is considering disabling Parquet dictionary filters, but PARQUET-1309
<https://issues.apache.org/jira/browse/PARQUET-1309> causes a problem
because the stats and dictionary filter config properties are swapped. And,
it is a bad idea to disable filtering for all of Parquet due to a bug like
this. (I've also suggested a work-around that I think is more likely.)

Since this is a correctness bug and Spark can't update to 1.11.0 in a patch
release of Spark, if the Parquet release were finished, I think we should
create a 1.10.1 release. I would include the fixes for PARQUET-1309 and
PARQUET-1510.

Is everyone okay with me creating a release candidate for 1.10.1? If so,
are there any other bugs that should be fixed in 1.10.1?

rb

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to