[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

ASF GitHub Bot (Jira) Tue, 14 Sep 2021 22:31:08 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415323#comment-17415323
 ]


ASF GitHub Bot commented on PARQUET-1968:
-----------------------------------------

huaxingao commented on a change in pull request #923:
URL: https://github.com/apache/parquet-mr/pull/923#discussion_r708850129



##########
File path: 
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java
##########
@@ -293,8 +290,48 @@ boolean isNullPage(int pageIndex) {
 
     @Override
     public <T extends Comparable<T>> PrimitiveIterator.OfInt visit(In<T> in) {
-      IntSet indexes = getMatchingIndexes(in);
-      return IndexIterator.filter(getPageCount(), indexes::contains);
+      Set<T> values = in.getValues();
+      TreeSet<T> myTreeSet = new TreeSet<>();
+      IntSet matchingIndexes1 = new IntOpenHashSet();  // for null
+      Iterator<T> it = values.iterator();
+      while(it.hasNext()) {
+        T value = it.next();
+        if (value != null) {
+          myTreeSet.add(value);
+        } else {
+          if (nullCounts == null) {
+            // Searching for nulls so if we don't have null related statistics 
we have to return all pages
+            return IndexIterator.all(getPageCount());
+          } else {
+            for (int i = 0; i < nullCounts.length; i++) {
+              if (nullCounts[i] > 0) {
+                matchingIndexes1.add(i);
+              }
+            }
+          }
+        }
+      }
+
+      IntSet matchingIndexes2 = new IntOpenHashSet();

Review comment:
       I will think more about what is a better way to implement this 
`visit(In<T> in)`. 

##########
File path: 
parquet-column/src/test/java/org/apache/parquet/internal/filter2/columnindex/TestColumnIndexFilter.java
##########
@@ -410,6 +410,7 @@ public void testFiltering() {
     Set<Integer> set5 = new HashSet<>();
     set5.add(7);
     set5.add(20);
+    System.out.println(in(intColumn("column5"), set5).toString());

Review comment:
       Sorry, removed.

##########
File path: 
parquet-generator/src/main/java/org/apache/parquet/filter2/IncrementallyUpdatedFilterPredicateGenerator.java
##########
@@ -67,15 +67,18 @@ public void run() throws IOException {
     add("package org.apache.parquet.filter2.recordlevel;\n" +
         "\n" +
         "import java.util.List;\n" +
+        "import java.util.Set;\n" +
         "\n" +
         "import org.apache.parquet.hadoop.metadata.ColumnPath;\n" +
         "import org.apache.parquet.filter2.predicate.Operators.Eq;\n" +
         "import org.apache.parquet.filter2.predicate.Operators.Gt;\n" +
         "import org.apache.parquet.filter2.predicate.Operators.GtEq;\n" +
+      "import org.apache.parquet.filter2.predicate.Operators.In;\n" +

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> FilterApi support In predicate
> ------------------------------
>
>                 Key: PARQUET-1968
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1968
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>    Affects Versions: 1.12.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> FilterApi should support native In predicate.
> Spark:
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala#L600-L605
> Impala:
> https://issues.apache.org/jira/browse/IMPALA-3654



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

Reply via email to