Re: [I] Improve Snowflake bind array inserts operations using adbc_ingest() [arrow-adbc]

via GitHub Mon, 04 Dec 2023 09:50:35 -0800


davlee1972 commented on issue #1322:
URL: https://github.com/apache/arrow-adbc/issues/1322#issuecomment-1839166597


   The original source are csv files and every column in the schema are just 
string types..
   
   I can confirm that when reading the csv files using multithreaded arrow you 
end up with a pyarrow table / recordbatch reader that produces record batch 
lengths of 12k, 13k, 114k, etc..
   
   The adbc ingest function just happens to send multiple snowflake insert ? ? 
? ? array inserts that match 12k, 13k, 114k, etc rows inserted in the history 
log.
   
   By merging multiple record batch arrays into lengths of 1 million before 
calling adbc ingest I can see 1 million row bind inserts in Snowflake.
   
   Each bind insert is taking 5 seconds so 36 bind inserts takes 3 minutes..
   
   Before with 3000 record batches the same bind insert was taking 3 hours..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Improve Snowflake bind array inserts operations using adbc_ingest() [arrow-adbc]

Reply via email to