danepitkin commented on code in PR #316:
URL: https://github.com/apache/arrow-cookbook/pull/316#discussion_r1281062018
##########
java/source/dataset.rst:
##########
@@ -533,4 +533,10 @@ Let's read a CSV file.
Salesforce Slack 27.7 01/12/2020
Total batch size: 3
-.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
\ No newline at end of file
+.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
+
+
+Write Parquet Files
+===================
+
+Go to :doc:`JDBC Adapter - Write ResultSet to Parquet File <jdbc>` for an
example.
Review Comment:
Ideally we would have a parquet example here that doesn't include things
like JDBC. Do you think it would best to add it as part of this PR?
##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,167 @@ values to the given scale.
102 true 100000000030.0000000 some char text [1,2]
INT_FIELD1 BOOL_FIELD2 BIGINT_FIELD5 CHAR_FIELD16 LIST_FIELD19
103 true 10000000003.0000000 some char text [1]
+
+Write ResultSet to Parquet File
+===============================
+
+In this example, we have the JDBC adapter result and trying to write them
+into a parquet file.
+
+.. testcode::
+
+ import java.io.BufferedReader;
+ import java.io.FileReader;
+ import java.io.IOException;
+ import java.nio.file.DirectoryStream;
+ import java.nio.file.Files;
+ import java.nio.file.Path;
+ import java.sql.Connection;
+ import java.sql.DriverManager;
+ import java.sql.ResultSet;
+ import java.sql.SQLException;
+ import java.sql.Types;
+ import java.util.HashMap;
+
+ import org.apache.arrow.adapter.jdbc.ArrowVectorIterator;
+ import org.apache.arrow.adapter.jdbc.JdbcFieldInfo;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrow;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
+ import org.apache.arrow.dataset.file.DatasetFileWriter;
+ import org.apache.arrow.dataset.file.FileFormat;
+ import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+ import org.apache.arrow.dataset.jni.NativeMemoryPool;
+ import org.apache.arrow.dataset.scanner.ScanOptions;
+ import org.apache.arrow.dataset.scanner.Scanner;
+ import org.apache.arrow.dataset.source.Dataset;
+ import org.apache.arrow.dataset.source.DatasetFactory;
+ import org.apache.arrow.memory.BufferAllocator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.VectorLoader;
+ import org.apache.arrow.vector.VectorSchemaRoot;
+ import org.apache.arrow.vector.VectorUnloader;
+ import org.apache.arrow.vector.ipc.ArrowReader;
+ import org.apache.arrow.vector.ipc.message.ArrowRecordBatch;
+ import org.apache.arrow.vector.types.pojo.Schema;
+ import org.apache.ibatis.jdbc.ScriptRunner;
+
+ class JDBCReader extends ArrowReader {
+ private final ArrowVectorIterator iter;
+ private final Schema schema;
+
+ public JDBCReader(BufferAllocator allocator, ArrowVectorIterator iter,
Schema schema) {
+ super(allocator);
+ this.iter = iter;
+ this.schema = schema;
+ }
+
+ @Override
+ public boolean loadNextBatch() throws IOException {
+ while (iter.hasNext()) {
+ try (VectorSchemaRoot rootTmp = iter.next()) {
+ if (rootTmp.getRowCount() > 0) {
+ VectorUnloader unloader = new VectorUnloader(rootTmp);
+ VectorLoader loader = new
VectorLoader(super.getVectorSchemaRoot());
+ try (ArrowRecordBatch recordBatch = unloader.getRecordBatch()) {
+ loader.load(recordBatch);
+ }
+ return true;
+ }
+ else {
+ return false;
+ }
+ }
+ }
+ return false;
+ }
+
+ @Override
+ public long bytesRead() {
+ return 0;
+ }
+
+ @Override
+ protected void closeReadSource() throws IOException {
+ }
+
+ @Override
+ protected Schema readSchema() {
+ return schema;
+ }
+ }
Review Comment:
Could we add whitespace to the code below so it's organized into sections? I
think it will be easier to read.
##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,167 @@ values to the given scale.
102 true 100000000030.0000000 some char text [1,2]
INT_FIELD1 BOOL_FIELD2 BIGINT_FIELD5 CHAR_FIELD16 LIST_FIELD19
103 true 10000000003.0000000 some char text [1]
+
+Write ResultSet to Parquet File
+===============================
+
+In this example, we have the JDBC adapter result and trying to write them
+into a parquet file.
Review Comment:
Hmm, was this specific example requested? I think it would be better to
include a minimal read/write parquet example in `dataset.rst` and remove this
one. The `jdbc.rst` already has an example for converting `ResultSet` to
`VectorSchemaRoot`. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]