danepitkin commented on code in PR #35570:
URL: https://github.com/apache/arrow/pull/35570#discussion_r1310926435


##########
java/dataset/src/main/cpp/jni_wrapper.cc:
##########
@@ -458,8 +467,8 @@ JNIEXPORT void JNICALL 
Java_org_apache_arrow_dataset_jni_JniWrapper_closeDataset
  * Signature: (J[Ljava/lang/String;JJ)J
  */
 JNIEXPORT jlong JNICALL 
Java_org_apache_arrow_dataset_jni_JniWrapper_createScanner(
-    JNIEnv* env, jobject, jlong dataset_id, jobjectArray columns, jlong 
batch_size,
-    jlong memory_pool_id) {
+    JNIEnv* env, jobject, jlong dataset_id, jobjectArray columns,
+    jobject substrait_extended_expression, jlong batch_size, jlong 
memory_pool_id) {

Review Comment:
   Instead of passing `substrait_extended_expression`, can we pass in `filter` 
and `projection` parameters?



##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java:
##########
@@ -69,4 +73,106 @@ public Optional<String[]> getColumns() {
   public long getBatchSize() {
     return batchSize;
   }
+
+  private ByteBuffer getProjection() {

Review Comment:
   `getProjection` and `getFilter` should return optional values similar to 
`getColumns` if we remove `getSubstraitExtendedExpression` (see below).



##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java:
##########
@@ -27,6 +28,9 @@
 public class ScanOptions {
   private final Optional<String[]> columns;
   private final long batchSize;
+  private ByteBuffer projection;
+  private ByteBuffer filter;

Review Comment:
   Should projection/filter be `final` if we have a builder for this object? We 
want the object to be immutable after creation I think.



##########
java/dataset/src/test/java/org/apache/arrow/dataset/substrait/TestAceroSubstraitConsumer.java:
##########
@@ -204,4 +205,93 @@ public void testRunBinaryQueryNamedTableNation() throws 
Exception {
       }
     }
   }
+

Review Comment:
   IMO it would be nice to see separate tests for `filter` and `projection` 
functionality. 



##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java:
##########
@@ -69,4 +83,8 @@ public Optional<String[]> getColumns() {
   public long getBatchSize() {
     return batchSize;
   }
+
+  public ByteBuffer getSubstraitExtendedExpression() {

Review Comment:
   Hey @davisusanibar , I think we should remove `getProjectionAndFilter` and 
`getSubstraitExtendedExpression`. If the user wants to set both, they can set 
filter and projection separately.



##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java:
##########
@@ -69,4 +73,106 @@ public Optional<String[]> getColumns() {
   public long getBatchSize() {
     return batchSize;
   }
+
+  private ByteBuffer getProjection() {
+    return projection;
+  }
+
+  private ByteBuffer getFilter() {
+    return filter;
+  }
+
+  private ByteBuffer getProjectionAndFilter() {
+    return projectionAndFilter;
+  }
+
+  /**
+   * To evaluate what option was used to define Substrait Extended Expression 
(Project/Filter).
+   *
+   * @return Substrait Extended Expression configured for project new columns 
and/or apply filter
+   */
+  public ByteBuffer getSubstraitExtendedExpression() {
+    if (getProjection() != null) {
+      return getProjection();
+    } else if (getFilter() != null) {
+      return getFilter();
+    } else if (getProjectionAndFilter() != null) {
+      return getProjectionAndFilter();
+    } else {
+      return null;
+    }
+  }
+
+  /**
+   * Builder for Options used during scanning.
+   */
+  public static class Builder {
+    private final long batchSize;
+    private final Optional<String[]> columns;
+    private ByteBuffer projection;
+    private ByteBuffer filter;
+    private ByteBuffer projectionAndFilter;
+
+    /**
+     * Constructor.
+     * @param batchSize Maximum row number of each returned {@link 
org.apache.arrow.vector.ipc.message.ArrowRecordBatch}
+     * @param columns (Optional) Projected columns. {@link Optional#empty()} 
for scanning all columns. Otherwise,
+     *                Only columns present in the Array will be scanned.
+     */
+    public Builder(long batchSize, Optional<String[]> columns) {

Review Comment:
   Should a Builder API only enforce mandatory args in its constructor (e.g. 
`batchSize`)? `columns` is optional and can have its own builder method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to