[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

via GitHub Tue, 01 Aug 2023 12:38:22 -0700


danepitkin commented on code in PR #316:
URL: https://github.com/apache/arrow-cookbook/pull/316#discussion_r1281062018



##########
java/source/dataset.rst:
##########
@@ -533,4 +533,10 @@ Let's read a CSV file.
    Salesforce    Slack    27.7    01/12/2020
    Total batch size: 3
 
-.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
\ No newline at end of file
+.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
+
+
+Write Parquet Files
+===================
+
+Go to :doc:`JDBC Adapter - Write ResultSet to Parquet File <jdbc>` for an 
example.

Review Comment:
   Ideally we would have a parquet example here that doesn't include things 
like JDBC. Do you think it would best to add it as part of this PR?



##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,167 @@ values to the given scale.
    102    true    100000000030.0000000    some char text      [1,2]
    INT_FIELD1    BOOL_FIELD2    BIGINT_FIELD5    CHAR_FIELD16    LIST_FIELD19
    103    true    10000000003.0000000    some char text      [1]
+
+Write ResultSet to Parquet File
+===============================
+
+In this example, we have the JDBC adapter result and trying to write them
+into a parquet file.
+
+.. testcode::
+
+    import java.io.BufferedReader;
+    import java.io.FileReader;
+    import java.io.IOException;
+    import java.nio.file.DirectoryStream;
+    import java.nio.file.Files;
+    import java.nio.file.Path;
+    import java.sql.Connection;
+    import java.sql.DriverManager;
+    import java.sql.ResultSet;
+    import java.sql.SQLException;
+    import java.sql.Types;
+    import java.util.HashMap;
+
+    import org.apache.arrow.adapter.jdbc.ArrowVectorIterator;
+    import org.apache.arrow.adapter.jdbc.JdbcFieldInfo;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrow;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
+    import org.apache.arrow.dataset.file.DatasetFileWriter;
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.VectorLoader;
+    import org.apache.arrow.vector.VectorSchemaRoot;
+    import org.apache.arrow.vector.VectorUnloader;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+    import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import org.apache.ibatis.jdbc.ScriptRunner;
+
+    class JDBCReader extends ArrowReader {
+      private final ArrowVectorIterator iter;
+      private final Schema schema;
+
+      public JDBCReader(BufferAllocator allocator, ArrowVectorIterator iter, 
Schema schema) {
+        super(allocator);
+        this.iter = iter;
+        this.schema = schema;
+      }
+
+      @Override
+      public boolean loadNextBatch() throws IOException {
+        while (iter.hasNext()) {
+          try (VectorSchemaRoot rootTmp = iter.next()) {
+            if (rootTmp.getRowCount() > 0) {
+              VectorUnloader unloader = new VectorUnloader(rootTmp);
+              VectorLoader loader = new 
VectorLoader(super.getVectorSchemaRoot());
+              try (ArrowRecordBatch recordBatch = unloader.getRecordBatch()) {
+                loader.load(recordBatch);
+              }
+              return true;
+            }
+            else {
+              return false;
+            }
+          }
+        }
+        return false;
+      }
+
+      @Override
+      public long bytesRead() {
+        return 0;
+      }
+
+      @Override
+      protected void closeReadSource() throws IOException {
+      }
+
+      @Override
+      protected Schema readSchema() {
+        return schema;
+      }
+    }

Review Comment:
   Could we add whitespace to the code below so it's organized into sections? I 
think it will be easier to read.



##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,167 @@ values to the given scale.
    102    true    100000000030.0000000    some char text      [1,2]
    INT_FIELD1    BOOL_FIELD2    BIGINT_FIELD5    CHAR_FIELD16    LIST_FIELD19
    103    true    10000000003.0000000    some char text      [1]
+
+Write ResultSet to Parquet File
+===============================
+
+In this example, we have the JDBC adapter result and trying to write them
+into a parquet file.

Review Comment:
   Hmm, was this specific example requested? I think it would be better to 
include a minimal read/write parquet example in `dataset.rst` and remove this 
one. The `jdbc.rst` already has an example for converting `ResultSet` to 
`VectorSchemaRoot`. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

Reply via email to