WeCodingNow opened a new issue, #931:
URL: https://github.com/apache/arrow-java/issues/931

   ### Describe the enhancement requested
   
   Arrow Flight SQL has a feature for ingesting massive datasets, Bulk 
Ingestion: https://github.com/apache/arrow/issues/38255
   It would be beneficial to use those special RPC methods for batched prepared 
statement calls when the prepared statement is strictly for inserting data.
   E.g. when Spark is used for writing data, it generates a simple SQL query 
like "INSERT INTO table(field1, field2, ...) VALUES (?, ?, ...)", creates a 
prepared statement, and then uses the prepared statement update RPC method to 
insert the rows of the dataset. If this feature is implemented, it would be 
possible for the driver to instead use the `DoPut(CommandStatementIngest)` 
instead.
   
   There are Arrow Flight SQL server implementations that work like this: when 
a `DoAction(ActionCreatePreparedStatementRequest)` is executed, the server 
creates up to two version of the prepared statement underlying data structure. 
One is a handle to a full-scale query engine execution procedure (e.g. 
DataFusion's logical plan), and another is a handle to a very simple procedure 
that just stores the received record batches in the storage - of course, the 
second procedure is only possible to be generated when the query is of a 
certain form; like the one used by Spark.
   Instead, I think that it should be possible to move this 
parsing-then-deciding-to-use-optimized-version-of-the-procedure logic into the 
client.
   
   Usecase for this integration is this: developers of Arrow Flight SQL servers 
could implement bulk ingestion command handlers and avoid implementing special 
logic for handling batched inserts in a special manner. Then the client would 
use this newly introduced driver option to allow the driver to decide to use 
the bulk ingestion RPC methods for inserting data.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to