wgtmac commented on code in PR #1029:
URL: https://github.com/apache/parquet-mr/pull/1029#discussion_r1108041382


##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##########
@@ -187,10 +196,7 @@ public <T extends Comparable<T>> Boolean visit(NotEq<T> 
notEq) {
 
     try {
       Set<T> dictSet = expandDictionary(meta);
-      boolean mayContainNull = (meta.getStatistics() == null
-          || !meta.getStatistics().isNumNullsSet()
-          || meta.getStatistics().getNumNulls() > 0);
-      if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value) && 
!mayContainNull) {
+      if (dictSet != null && dictSet.size() == 1 && dictSet.contains(value)) {

Review Comment:
   I just noticed that the `FilterPredicate` does not provide an entry for `IS 
NULL` or `IS NOT NULL`. This confuses me because `col IS NOT NULL` is not equal 
to `col != NULL`.
   
   CMIW, `col NOT EQ A` has two meanings as below:
   - If A is NULL, it should return an empty list. Because NULL cannot be 
compared to any value including another NULL.
   - Otherwise, it should return a list of values excluding A and NULL.
   
   cc @huaxingao @gszadovszky @shangxinli 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to