[ 
https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407416#comment-17407416
 ] 

ASF GitHub Bot commented on PARQUET-1968:
-----------------------------------------

shangxinli commented on a change in pull request #923:
URL: https://github.com/apache/parquet-mr/pull/923#discussion_r698598441



##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -247,6 +250,80 @@ public int hashCode() {
     }
   }
 
+  // base class for In and NotIn

Review comment:
       Have a better comment since it is public method  

##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -247,6 +250,80 @@ public int hashCode() {
     }
   }
 
+  // base class for In and NotIn
+  public static abstract class SetColumnFilterPredicate<T extends 
Comparable<T>> implements FilterPredicate, Serializable {
+    private final Column<T> column;
+    private final Set<T> values;
+    private final String toString;
+
+    protected SetColumnFilterPredicate(Column<T> column, Set<T> values) {
+      this.column = Objects.requireNonNull(column, "column cannot be null");
+      this.values = Objects.requireNonNull(values, "values cannot be null");
+      checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate 
shouldn't be empty!");
+
+      String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);

Review comment:
       I see you have a 'toString' to cache but do we see generally this is 
reused multiple times? If no, proactively converting to string will be a waste. 
 

##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -247,6 +250,80 @@ public int hashCode() {
     }
   }
 
+  // base class for In and NotIn
+  public static abstract class SetColumnFilterPredicate<T extends 
Comparable<T>> implements FilterPredicate, Serializable {
+    private final Column<T> column;
+    private final Set<T> values;
+    private final String toString;
+
+    protected SetColumnFilterPredicate(Column<T> column, Set<T> values) {
+      this.column = Objects.requireNonNull(column, "column cannot be null");
+      this.values = Objects.requireNonNull(values, "values cannot be null");
+      checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate 
shouldn't be empty!");
+
+      String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
+      StringBuilder str = new StringBuilder();
+      int iter = 0;
+      for (T value : values) {
+        if (iter >= 100) break;
+        str.append(value).append(", ");
+        iter++;
+      }
+      String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 
2) : str + "...";
+      this.toString = name + "(" + column.getColumnPath().toDotString() + ", " 
+ valueStr + ")";

Review comment:
       Would it be possible to merge lines 272 and 273 into the above code of 
that building? the string? String operations sometimes consume a lot of memory 
like this. 

##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -247,6 +250,80 @@ public int hashCode() {
     }
   }
 
+  // base class for In and NotIn
+  public static abstract class SetColumnFilterPredicate<T extends 
Comparable<T>> implements FilterPredicate, Serializable {
+    private final Column<T> column;
+    private final Set<T> values;
+    private final String toString;
+
+    protected SetColumnFilterPredicate(Column<T> column, Set<T> values) {
+      this.column = Objects.requireNonNull(column, "column cannot be null");
+      this.values = Objects.requireNonNull(values, "values cannot be null");
+      checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate 
shouldn't be empty!");
+
+      String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
+      StringBuilder str = new StringBuilder();
+      int iter = 0;
+      for (T value : values) {
+        if (iter >= 100) break;
+        str.append(value).append(", ");
+        iter++;
+      }
+      String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 
2) : str + "...";
+      this.toString = name + "(" + column.getColumnPath().toDotString() + ", " 
+ valueStr + ")";
+    }
+
+    public Column<T> getColumn() {
+      return column;
+    }
+
+    public Set<T> getValues() {
+      return values;
+    }
+
+    @Override
+    public String toString() {
+      return toString;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      if (o == null || getClass() != o.getClass()) return false;

Review comment:
       I guess you can just 'return this.getClass() == o.getClass()'

##########
File path: 
parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java
##########
@@ -247,6 +250,80 @@ public int hashCode() {
     }
   }
 
+  // base class for In and NotIn
+  public static abstract class SetColumnFilterPredicate<T extends 
Comparable<T>> implements FilterPredicate, Serializable {
+    private final Column<T> column;
+    private final Set<T> values;
+    private final String toString;
+
+    protected SetColumnFilterPredicate(Column<T> column, Set<T> values) {
+      this.column = Objects.requireNonNull(column, "column cannot be null");
+      this.values = Objects.requireNonNull(values, "values cannot be null");
+      checkArgument(!values.isEmpty(), "values in SetColumnFilterPredicate 
shouldn't be empty!");
+
+      String name = getClass().getSimpleName().toLowerCase(Locale.ENGLISH);
+      StringBuilder str = new StringBuilder();
+      int iter = 0;
+      for (T value : values) {
+        if (iter >= 100) break;
+        str.append(value).append(", ");
+        iter++;
+      }
+      String valueStr = values.size() <= 100 ? str.substring(0, str.length() - 
2) : str + "...";
+      this.toString = name + "(" + column.getColumnPath().toDotString() + ", " 
+ valueStr + ")";
+    }
+
+    public Column<T> getColumn() {
+      return column;
+    }
+
+    public Set<T> getValues() {
+      return values;
+    }
+
+    @Override
+    public String toString() {
+      return toString;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+      if (this == o) return true;
+      if (o == null || getClass() != o.getClass()) return false;
+      SetColumnFilterPredicate<?> that = (SetColumnFilterPredicate<?>) o;
+      return column.equals(that.column) && values.equals(that.values) && 
Objects.equals(toString, that.toString);

Review comment:
       Is toString comparison still needed here? It seems toString have (values 
and class). You can just compare class here. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FilterApi support In predicate
> ------------------------------
>
>                 Key: PARQUET-1968
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1968
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> FilterApi should support native In predicate.
> Spark:
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L600-L605
> Impala:
> https://issues.apache.org/jira/browse/IMPALA-3654



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to