[GitHub] [arrow] davisusanibar commented on a diff in pull request #35570: GH-34252: [Java] Support ScannerBuilder::Project or ScannerBuilder::Filter as a Substrait proto extended expression

via GitHub Wed, 06 Sep 2023 19:09:24 -0700


davisusanibar commented on code in PR #35570:
URL: https://github.com/apache/arrow/pull/35570#discussion_r1318007446



##########
java/dataset/src/test/java/org/apache/arrow/dataset/substrait/TestAceroSubstraitConsumer.java:
##########
@@ -204,4 +205,170 @@ public void testRunBinaryQueryNamedTableNation() throws 
Exception {
       }
     }
   }
+
+  @Test
+  public void testBaseParquetReadWithExtendedExpressionsFilter() throws 
Exception {
+    final Schema schema = new Schema(Arrays.asList(
+        Field.nullable("id", new ArrowType.Int(32, true)),
+        Field.nullable("name", new ArrowType.Utf8())
+    ), null);
+    // Substrait Extended Expression: Filter:
+    // Expression 01: WHERE ID < 20
+    String base64EncodedSubstraitFilter = 
"Ch4IARIaL2Z1bmN0aW9uc19jb21wYXJpc29uLnlhbWwSEhoQCAIQAhoKbHQ6YW55X2F" +
+        
"ueRo3ChwaGggCGgQKAhABIggaBhIECgISACIGGgQKAigUGhdmaWx0ZXJfaWRfbG93ZXJfdGhhbl8yMCIaCgJJRAoETkFNRRIOCgQqAhA"
 +
+        "BCgRiAhABGAI=";
+    ByteBuffer substraitExpressionFilter = 
getByteBuffer(base64EncodedSubstraitFilter);
+    ParquetWriteSupport writeSupport = ParquetWriteSupport
+        .writeTempFile(AVRO_SCHEMA_USER, TMP.newFolder(), 19, "value_19", 1, 
"value_1",
+            11, "value_11", 21, "value_21", 45, "value_45");
+    ScanOptions options = new ScanOptions.Builder(/*batchSize*/ 32768)
+        .columns(Optional.empty())
+        .substraitFilter(substraitExpressionFilter)
+        .build();
+    try (
+        DatasetFactory datasetFactory = new 
FileSystemDatasetFactory(rootAllocator(), NativeMemoryPool.getDefault(),
+            FileFormat.PARQUET, writeSupport.getOutputURI());
+        Dataset dataset = datasetFactory.finish();
+        Scanner scanner = dataset.newScan(options);
+        ArrowReader reader = scanner.scanBatches()
+    ) {
+      assertEquals(schema.getFields(), 
reader.getVectorSchemaRoot().getSchema().getFields());
+      int rowcount = 0;
+      while (reader.loadNextBatch()) {
+        rowcount += reader.getVectorSchemaRoot().getRowCount();
+        
assertTrue(reader.getVectorSchemaRoot().getVector("id").toString().equals("[19, 
1, 11]"));
+        assertTrue(reader.getVectorSchemaRoot().getVector("name").toString()
+            .equals("[value_19, value_1, value_11]"));

Review Comment:
   For that purpose, a ValueVector needs to be created on the fly, mutating 
with fixed data added. It would be possible for me to do that, but it would add 
a step that will not be relevant to the purpose of this uni test. Let me 
confirm that this should be added
   
   ```
         IntVector valueVector = new IntVector("id", rootAllocator());
         valueVector.allocateNew(3);
         valueVector.set(0, 19);
         valueVector.set(1, 1);
         valueVector.set(2, 11);
         valueVector.setValueCount(3);
         ...
         assertEquals(reader.getVectorSchemaRoot().getVector("id").toString(), 
valueVector.toString());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] davisusanibar commented on a diff in pull request #35570: GH-34252: [Java] Support ScannerBuilder::Project or ScannerBuilder::Filter as a Substrait proto extended expression

Reply via email to