[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

via GitHub Fri, 11 Aug 2023 13:57:00 -0700


danepitkin commented on code in PR #316:
URL: https://github.com/apache/arrow-cookbook/pull/316#discussion_r1291726736



##########
java/source/dataset.rst:
##########
@@ -533,4 +533,92 @@ Let's read a CSV file.
    Salesforce    Slack    27.7    01/12/2020
    Total batch size: 3
 
-.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
\ No newline at end of file
+.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
+
+
+Write Parquet Files

Review Comment:
   Can we move this to `io.rst`? That's were "Read parquet" is.



##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,191 @@ values to the given scale.
    102    true    100000000030.0000000    some char text      [1,2]
    INT_FIELD1    BOOL_FIELD2    BIGINT_FIELD5    CHAR_FIELD16    LIST_FIELD19
    103    true    10000000003.0000000    some char text      [1]
+
+Write ResultSet to Parquet File
+===============================
+
+As an example, we are trying to write a parquet file from the JDBC adapter 
results.
+
+.. testcode::
+
+    import java.io.BufferedReader;
+    import java.io.FileReader;
+    import java.io.IOException;
+    import java.nio.file.DirectoryStream;
+    import java.nio.file.Files;
+    import java.nio.file.Path;
+    import java.sql.Connection;
+    import java.sql.DriverManager;
+    import java.sql.ResultSet;
+    import java.sql.SQLException;
+    import java.sql.Types;
+    import java.util.HashMap;
+
+    import org.apache.arrow.adapter.jdbc.ArrowVectorIterator;
+    import org.apache.arrow.adapter.jdbc.JdbcFieldInfo;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrow;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
+    import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
+    import org.apache.arrow.dataset.file.DatasetFileWriter;
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.VectorSchemaRoot;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+    import org.apache.arrow.vector.types.pojo.Schema;
+    import org.apache.ibatis.jdbc.ScriptRunner;
+    import org.slf4j.LoggerFactory;
+
+    import ch.qos.logback.classic.Level;
+    import ch.qos.logback.classic.Logger;
+
+    class JDBCReader extends ArrowReader {

Review Comment:
   Could we somehow delete the duplicate code here and reuse the other one? Or 
combine the two?



##########
java/source/io.rst:
##########
@@ -579,3 +579,95 @@ Reading and writing dictionary-encoded data requires 
separately tracking the dic
    Dictionary-encoded data recovered: [0, 3, 4, 5, 7]
    Dictionary recovered: Dictionary 
DictionaryEncoding[id=666,ordered=false,indexType=Int(8, true)] [Andorra, Cuba, 
Grecia, Guinea, Islandia, Malta, Tailandia, Uganda, Yemen, Zambia]
    Decoded data: [Andorra, Guinea, Islandia, Malta, Uganda]
+
+Customize Logic to Read Dataset

Review Comment:
   Can we move this to `jdbc.rst`? I think it fits better there since its 
directly applicable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

Reply via email to