gszadovszky commented on code in PR #1328:
URL: https://github.com/apache/parquet-mr/pull/1328#discussion_r1580931878


##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java:
##########
@@ -315,6 +321,66 @@ public <R> R accept(Visitor<R> visitor) {
     }
   }
 
+  public abstract static class ArrayContainsFilterPredicate<T extends 
Comparable<T>>

Review Comment:
   It seems this class does not need to be public. Let's have it package 
private.



##########
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java:
##########
@@ -315,6 +321,66 @@ public <R> R accept(Visitor<R> visitor) {
     }
   }
 
+  public abstract static class ArrayContainsFilterPredicate<T extends 
Comparable<T>>
+      implements FilterPredicate, Serializable {
+    private final Column<T> elementColumn;
+    private final T value;
+
+    protected ArrayContainsFilterPredicate(Column<T> elementColumn, T value) {
+      this.elementColumn = Objects.requireNonNull(elementColumn, 
"elementColumn cannot be null");
+      this.value = value;
+    }
+
+    public Column<T> getColumn() {
+      return elementColumn;
+    }
+
+    public T getValue() {
+      return value;
+    }
+
+    @Override
+    public String toString() {
+      String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
+      return name + "(" + elementColumn.getColumnPath().toDotString() + ", " + 
value + ")";
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      if (o == null || getClass() != o.getClass()) return false;
+      ArrayContainsFilterPredicate<?> that = (ArrayContainsFilterPredicate<?>) 
o;
+      return elementColumn.equals(that.elementColumn) && 
value.equals(that.value);

Review Comment:
   It seems that `value` can be `null`. In this case it'll throw an NPE.



##########
parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java:
##########
@@ -487,6 +489,92 @@ public <T extends Comparable<T>> Boolean visit(NotIn<T> 
notIn) {
     return BLOCK_MIGHT_MATCH;
   }
 
+  @Override
+  public <T extends Comparable<T>> Boolean visit(Contains<T> contains) {
+    T value = contains.getValue();
+
+    // Dictionary only contains non-null values, so we can't use it to filter 
here
+    if (value == null) {
+      return BLOCK_MIGHT_MATCH;
+    }
+
+    Column<T> filterColumn = contains.getColumn();
+    ColumnChunkMetaData meta = getColumnChunk(filterColumn.getColumnPath());
+
+    if (meta == null) {
+      // the column isn't in this file so all values are null, but the value
+      // must be non-null because of the above check.
+      return BLOCK_CANNOT_MATCH;
+    }
+
+    // if the chunk has non-dictionary pages, don't bother decoding the
+    // dictionary because the row group can't be eliminated.
+    if (hasNonDictionaryPages(meta)) {
+      return BLOCK_MIGHT_MATCH;
+    }
+
+    try {
+      Set<T> dictSet = expandDictionary(meta);
+      if (dictSet != null && dictSet.contains(value)) {
+        return BLOCK_MIGHT_MATCH;
+      } else {
+        return BLOCK_CANNOT_MATCH;
+      }
+    } catch (IOException e) {
+      LOG.warn("Failed to process dictionary for filter evaluation.", e);
+    }
+    return BLOCK_MIGHT_MATCH; // cannot drop the row group based on this 
dictionary
+  }
+
+  @Override
+  public <T extends Comparable<T>> Boolean visit(DoesNotContain<T> 
doesNotContain) {
+    T value = doesNotContain.getValue();
+
+    Column<T> filterColumn = doesNotContain.getColumn();
+    ColumnChunkMetaData meta = getColumnChunk(filterColumn.getColumnPath());
+
+    if (value == null) {
+      if (meta == null) {
+        // @Todo ??

Review Comment:
   Why the TODO? No metadata means the column is not in the file so every value 
is null. Therefore returning `BLACK_CANNOT_MATCH` is correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to