wgtmac commented on code in PR #1029:
URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382
##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##########
@@ -187,10 +196,7 @@ public <T extends Comparable<T>> Boolean visit(NotEq<T>
notEq) {
try {
Set<T> dictSet = expandDictionary(meta);
- boolean mayContainNull = (meta.getStatistics() == null
- || !meta.getStatistics().isNumNullsSet()
- || meta.getStatistics().getNumNulls() > 0);
- if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value) &&
!mayContainNull) {
+ if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value)) {
Review Comment:
I just noticed that the `FilterPredicate` does not provide an entry for `IS
NULL` or `IS NOT NULL`. This confuses me because `col IS NOT NULL` is not equal
to `col != NULL`.
CMIW, `col NOT EQ A` has two meanings as below:
- If A is NULL, it should return an empty list. Because NULL cannot be
compared to any value including another NULL.
- Otherwise, it should return a list of values excluding A and NULL.
cc @huaxingao @gszadovszky @shangxinli
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]