Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

via GitHub Fri, 17 Oct 2025 20:45:38 -0700


zeroshade commented on issue #3480:
URL: https://github.com/apache/arrow-adbc/issues/3480#issuecomment-3378708958


   @CurtHagenlocher So this is interesting, I put together a pure Go test using 
the NYC Taxi dataset (since you mentioned 20M rows so I figured that was an 
easy way to get a lot of rows).
   
   I have a subset of around 27,982,347 rows that I used to test with Bulk 
Ingestion to snowflake using the ADBC driver:
   
   * With the pure default settings, I ended up with around 6 files uploaded 
and the total ingestion took under 1 minute
   * When I artificially forced it to use smaller batches coming from the data, 
I ended up with roughly 90 files uploaded (each one smaller) and the entire 
ingestion took just over 1.5 minutes
   
   I can mess around with the settings to control the performance, but so far I 
haven't managed to create the case you had with it taking over an hour.
   
   Can you provide any more details about your setup that took over an hour? 
   
   * Which driver manager/language were you calling the ADBC driver from? 
   * What was the batch size of the record reader you were providing the data 
through? 
   * Were you using the default options or custom concurrency settings? 
   * Is 27.9M rows from the NYC Yellow Taxi dataset a sufficient representation 
of your dataset that it is a similar test?
   
   For reference, here's the Go code for my little test:
   
   ```go
   drv := snowflake.NewDriver(memory.DefaultAllocator)
   db, err := drv.NewDatabase(map[string]string{
        "uri": os.Getenv("SNOWFLAKE_URI"),
   })
   if err != nil {
       panic(err)
   }
   defer db.Close()
   
   ctx := context.Background()
   conn, err := db.Open(ctx)
   if err != nil {
       panic(err)
   }
   defer conn.Close()
   
   matches, err := filepath.Glob("*.parquet")
   if err != nil {
       panic(err)
   }
   
   // ... create a record reader for the list of parquet files
   
   stmt, err := conn.NewStatement()
   if err != nil {
       panic(err)
   }
   defer stmt.Close()
   
   stmt.BindStream(ctx, reader)
   stmt.SetOption(adbc.OptionKeyIngestTargetTable, "adbc_slow_ingest")
   n, err := stmt.ExecuteUpdate(ctx)
   if err != nil {
       panic(err)
   }
   
   fmt.Println("records ingested:", n)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Troubleshooting bulk insert performance of Snowflake connector [arrow-adbc]

Reply via email to