Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

via GitHub Sat, 18 Oct 2025 00:15:01 -0700


CurtHagenlocher commented on issue #3480:
URL: https://github.com/apache/arrow-adbc/issues/3480#issuecomment-3378924147


   Thanks for looking! I'm fairly convinced at this point that there's nothing 
obviously wrong with the driver itself. While I'm waiting to hear back from 
Snowflake, I'll try to find the time to make a standalone repro.
   
   The existing repro happens in our product, which is implemented in C# and 
about as far away from "standalone" as you can imagine.   It uses entirely 
default settings for ingestion. The input batches are relatively small, but 
from what I can tell of the driver source the input batch size is entirely 
dissociated from the number of rows in the uploaded Parquet files. It 
eventually uploads 72 individual files. The first 48 are ~13MB in size and have 
~354k rows while the remaining 24 are ~6MB and have ~157k rows. This matches my 
read of the code, which suggests that records are individually queued to a 
channel and then picked up by one of N readers that are building the Parquet 
files. The default value of N is `runtime.NumCPU` and this machine does indeed 
have 24 logical CPUs.
   
   I was surprised by the records apparently being individually queued to 
channels as I would expect this to have a lot of overhead, but that doesn't 
really appear to be the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

Reply via email to