Zan-L commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2221616082

   @zeroshade Thanks for the insights. Here is my feedback:
   1. `adbc.snowflake.statement.ingest_target_file_size` and 
`adbc.snowflake.statement.ingest_writer_concurrency` finally worked with 
`cursor.adbc_statement.set_options()` (so there is indeed a mistake in the 
documentation). However, the data is still split into number of concurrent 
writers despite `adbc.snowflake.statement.ingest_target_file_size`.
   2. I now tend to believe it is caused by the switch to `WriteBuffered(rec)`. 
My data is written into PyArrow Dataset with Delta Lake PyArrow engine, which 
uses the default row group size of 1M rows. If this function causes to write at 
least one row group of 1M rows, for a dataset of millions of rows with lots of 
columns and especially string ones, it leads to all data being read into the 
memory uncompressed.
   3. I don't have a way to monitor the memory usage of the containers yet as 
they are ephemeral ECS tasks, though I did try to increase the memory size from 
16 GB to 32, which didn't make things work. Data of jobs that failed are in the 
scale of hundreds of megabytes though, if that helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to