wgtmac commented on code in PR #1029: URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382
########## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ########## @@ -187,10 +196,7 @@ public <T extends Comparable<T>> Boolean visit(NotEq<T> notEq) { try { Set<T> dictSet = expandDictionary(meta); - boolean mayContainNull = (meta.getStatistics() == null - || !meta.getStatistics().isNumNullsSet() - || meta.getStatistics().getNumNulls() > 0); - if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value) && !mayContainNull) { + if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value)) { Review Comment: I just noticed that the `FilterPredicate` does not provide an entry for `IS NULL` or `IS NOT NULL`. This confuses me because `col IS NOT NULL` is not equal to `col != NULL`. CMIW, `col NOT EQ A` has two meanings as below: - If A is NULL, it should return an empty list. Because NULL cannot be compared to any value including another NULL. - Otherwise, it should return a list of values excluding A and NULL. cc @huaxingao @gszadovszky @shangxinli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org