Re: [I] pyarrow/adbc: no support for binding DECIMAL in Snowflake driver [arrow-adbc]

via GitHub Mon, 19 Aug 2024 17:53:48 -0700


zeroshade commented on issue #2084:
URL: https://github.com/apache/arrow-adbc/issues/2084#issuecomment-2297764006


   `cursor.adbc_ingest` is going to use `CREATE STAGE` and `COPY INTO ... FROM 
@state ... ` as documented in Snowflake's documentation at 
https://docs.snowflake.com/en/user-guide/data-load-local-file-system for 
efficient bulk loading of data. It will be *significantly* more performant than 
using `INSERT INTO` with bind parameters due to the way Snowflake's API works. 
   
   > Essentially my use case is akin to classic ETL: stream in batches of data 
(SELECT), transform, stream back (INSERT). While keeping it all under strict 
memory requirements (for example 4x16MB batches max at the same time).
   
   That's precisely what the `adbc_ingest` functionality is designed to be 
optimal for. Essentially the record batches are written to Parquet files in 
parallel (both the level of concurrency and the size of the parquet files are 
configurable) which are uploaded to Snowflake directly for staging before then 
being loaded into the table. (Again, following the steps in the above linked 
documentation, but with concurrency.
   
   > It would be also nice to know where the limitation comes from? Snowflake 
ADBC-server implementation?
   
   The Snowflake ADBC driver utilizes Snowflake's Go client for communication, 
which does not support any decimal type as a bind parameter, and ADBC doesn't 
make any attempt currently to perform a cast for binding (we can optionally 
perform casting on receiving). Snowflake's Server implementation, currently, 
does not accept Arrow data directly for bind parameter input. That's part of 
why `adbc_ingest` uploads as Parquet files (aside from the fact that also 
performs compression etc. which is another reason why it's more performant than 
using `INSERT INTO` with bind parameters.
   
   If at all possible, my recommendation here would be to either use 
`adbc_ingest` for your inserts if possible. Otherwise, you may need to cast 
your decimal data to int64/float64 in the Arrow record batches before you bind 
the stream if you can't use `adbc_ingest`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] pyarrow/adbc: no support for binding DECIMAL in Snowflake driver [arrow-adbc]

Reply via email to