[GitHub] [arrow-cookbook] pronzato commented on pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

via GitHub Thu, 28 Sep 2023 10:49:23 -0700


pronzato commented on PR #316:
URL: https://github.com/apache/arrow-cookbook/pull/316#issuecomment-1739766832


   Hi David,
   
   When I try to run JDBCReader I get URI has empty scheme
   
   java.lang.RuntimeException: URI has empty scheme: '/tmp
   
                   at
   org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native
   Method)
   
                   at
   
org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46)
   
                   at
   
org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59)
   
   Any idea what could be causing this?
   
   Regards
   
   GP
   
   
   
   On Fri, Sep 15, 2023, 5:05 PM David Li ***@***.***> wrote:
   
   > ***@***.**** commented on this pull request.
   > ------------------------------
   >
   > In java/source/jdbc.rst
   > <https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327786069>
   > :
   >
   > > +
   > +      @Override
   > +      protected Schema readSchema() throws IOException {
   > +        return null;
   > +      }
   > +
   > +      @Override
   > +      public VectorSchemaRoot getVectorSchemaRoot() throws IOException {
   > +        if (root == null) {
   > +          root = iter.next();
   > +        }
   > +        return root;
   > +      }
   > +    }
   > +
   > +    ((Logger) 
LoggerFactory.getLogger("org.apache.arrow")).setLevel(Level.TRACE);
   >
   > Why are we fiddling with loggers and adding logback to the example? I
   > don't think we need any of that?
   > ------------------------------
   >
   > In java/source/jdbc.rst
   > <https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327786570>
   > :
   >
   > > +    import org.apache.arrow.dataset.scanner.ScanOptions;
   > +    import org.apache.arrow.dataset.scanner.Scanner;
   > +    import org.apache.arrow.dataset.source.Dataset;
   > +    import org.apache.arrow.dataset.source.DatasetFactory;
   > +    import org.apache.arrow.memory.BufferAllocator;
   > +    import org.apache.arrow.memory.RootAllocator;
   > +    import org.apache.arrow.vector.VectorSchemaRoot;
   > +    import org.apache.arrow.vector.ipc.ArrowReader;
   > +    import org.apache.arrow.vector.types.pojo.Schema;
   > +    import org.apache.ibatis.jdbc.ScriptRunner;
   > +    import org.slf4j.LoggerFactory;
   > +
   > +    import ch.qos.logback.classic.Level;
   > +    import ch.qos.logback.classic.Logger;
   > +
   > +    class JDBCReader extends ArrowReader {
   >
   > Explain that we need this because writing a dataset takes an ArrowReader,
   > so we have to adapt the JDBC ArrowVectorIterator to the ArrowReader
   > interface
   > ------------------------------
   >
   > In java/source/jdbc.rst
   > <https://github.com/apache/arrow-cookbook/pull/316#discussion_r1327787518>
   > :
   >
   > > +        final BufferAllocator allocatorParquetWrite = 
allocator.newChildAllocator("allocatorParquetWrite", 0,
   > +            Long.MAX_VALUE);
   > +        final Connection connection = DriverManager.getConnection(
   > +            "jdbc:h2:mem:h2-jdbc-adapter")
   > +    ) {
   > +      ScriptRunner runnerDDLDML = new ScriptRunner(connection);
   > +      runnerDDLDML.setLogWriter(null);
   > +      runnerDDLDML.runScript(new BufferedReader(
   > +          new FileReader("./thirdpartydeps/jdbc/h2-ddl.sql")));
   > +      runnerDDLDML.runScript(new BufferedReader(
   > +          new FileReader("./thirdpartydeps/jdbc/h2-dml.sql")));
   > +      JdbcToArrowConfig config = new 
JdbcToArrowConfigBuilder(allocatorJDBC,
   > +          JdbcToArrowUtils.getUtcCalendar())
   > +          .setTargetBatchSize(2)
   > +          .setReuseVectorSchemaRoot(true)
   > +          .setArraySubTypeByColumnNameMap(
   >
   > In the interest of keeping examples concise, let's use sample data that
   > doesn't require us to deal with all of this in the first place.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/arrow-cookbook/pull/316#pullrequestreview-1629722233>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/ACO7PHBDXWQAOPIBFG2WX6LX2S7IZANCNFSM6AAAAAA2WFM25A>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-cookbook] pronzato commented on pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file

Reply via email to