Re: [PR] GH-40055: [Java][Docs] Simplify use of Filter and Expression into Dataset Substrait [arrow]

via GitHub Tue, 13 Feb 2024 02:49:13 -0800


davisusanibar commented on code in PR #40056:
URL: https://github.com/apache/arrow/pull/40056#discussion_r1487610459



##########
docs/source/java/substrait.rst:
##########
@@ -148,297 +136,60 @@ This Java program:
     import org.apache.arrow.memory.BufferAllocator;
     import org.apache.arrow.memory.RootAllocator;
     import org.apache.arrow.vector.ipc.ArrowReader;
+    import org.apache.calcite.sql.parser.SqlParseException;
+
+    import java.nio.ByteBuffer;
+    import java.util.Base64;
+    import java.util.Optional;
 
     public class ClientSubstraitExtendedExpressionsCookbook {
 
-      public static void main(String[] args) throws Exception {
-        // project and filter dataset using extended expression definition - 
03 Expressions:
-        // Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - 
' || col 3
-        // Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10
-        // Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18
+      public static void main(String[] args) throws SqlParseException {
         projectAndFilterDataset();
       }
 
-      public static void projectAndFilterDataset() {
+      private static void projectAndFilterDataset() throws SqlParseException {
         String uri = "file:///Users/data/tpch_parquet/nation.parquet";
-        ScanOptions options = new ScanOptions.Builder(/*batchSize*/ 32768)
-            .columns(Optional.empty())
-            .substraitFilter(getSubstraitExpressionFilter())
-            .substraitProjection(getSubstraitExpressionProjection())
-            .build();
-        try (
-            BufferAllocator allocator = new RootAllocator();
-            DatasetFactory datasetFactory = new FileSystemDatasetFactory(
-                allocator, NativeMemoryPool.getDefault(),
-                FileFormat.PARQUET, uri);
-            Dataset dataset = datasetFactory.finish();
-            Scanner scanner = dataset.newScan(options);
-            ArrowReader reader = scanner.scanBatches()
-        ) {
+        ScanOptions options =
+            new ScanOptions.Builder(/*batchSize*/ 32768)
+                .columns(Optional.empty())
+                .substraitFilter(getByteBuffer(new String[]{"N_NATIONKEY > 
18"}))
+                .substraitProjection(getByteBuffer(new String[]{"N_REGIONKEY + 
10",
+                    "N_NAME || CAST(' - ' as VARCHAR) || N_COMMENT"}))
+                .build();
+        try (BufferAllocator allocator = new RootAllocator();
+             DatasetFactory datasetFactory =
+                 new FileSystemDatasetFactory(
+                     allocator, NativeMemoryPool.getDefault(), 
FileFormat.PARQUET, uri);
+             Dataset dataset = datasetFactory.finish();
+             Scanner scanner = dataset.newScan(options);
+             ArrowReader reader = scanner.scanBatches()) {
           while (reader.loadNextBatch()) {
-            System.out.println(
-                reader.getVectorSchemaRoot().contentToTSVString());
+            
System.out.println(reader.getVectorSchemaRoot().contentToTSVString());

Review Comment:
   I'll review how a linter will be configured/implemented for Java code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-40055: [Java][Docs] Simplify use of Filter and Expression into Dataset Substrait [arrow]

Reply via email to