[ 
https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123484#comment-17123484
 ] 

Uwe Korn commented on ARROW-4144:
---------------------------------

Yes, the usecase would be to write large {{pandas.DataFrames}} to a database 
layer that only has performant JDBC drivers. Personally, all my JDBC sources 
are read-only and thus I didn't write a WriteToJDBC function but other people 
will also use these technologies with more access rights. I have used the 
"pyarrow->Arrow Java -> JDBC" successfully with Apache Drill and Denodo. I also 
heard that some people use it together with Amazon Athena and here a performant 
INSERT might be interesting 
[https://docs.aws.amazon.com/athena/latest/ug/insert-into.html] as the JDBC 
driver seems to be the most performant currently.

> [Java] Arrow-to-JDBC
> --------------------
>
>                 Key: ARROW-4144
>                 URL: https://issues.apache.org/jira/browse/ARROW-4144
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Michael Pigott
>            Assignee: Chen
>            Priority: Major
>
> ARROW-1780 reads a query from a JDBC data source and converts the ResultSet 
> to an Arrow VectorSchemaRoot.  However, there is no built-in adapter for 
> writing an Arrow VectorSchemaRoot back to the database.
> ARROW-3966 adds JDBC field metadata:
>  * The Catalog Name
>  * The Table Name
>  * The Field Name
>  * The Field Type
> We can use this information to ask for the field information from the 
> database via the 
> [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html]
>  object.  We can then create INSERT or UPDATE statements based on the [list 
> of primary 
> keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)]
>  in the table:
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> NULL, insert that record into the database.
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> not NULL, update the existing record in the database.
> We can also perform the same data conversion in reverse based on the field 
> types queried from the database.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to