Zan-L commented on issue #1997: URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2221616082
@zeroshade Thanks for the insights. Here is my feedback: 1. `adbc.snowflake.statement.ingest_target_file_size` and `adbc.snowflake.statement.ingest_writer_concurrency` finally worked with `cursor.adbc_statement.set_options()` (so there is indeed a mistake in the documentation). However, the data is still split into number of concurrent writers despite `adbc.snowflake.statement.ingest_target_file_size`. 2. I now tend to believe it is caused by the switch to `WriteBuffered(rec)`. My data is written into PyArrow Dataset with Delta Lake PyArrow engine, which uses the default row group size of 1M rows. If this function causes to write at least one row group of 1M rows, for a dataset of millions of rows with lots of columns and especially string ones, it leads to all data being read into the memory uncompressed. 3. I don't have a way to monitor the memory usage of the containers yet as they are ephemeral ECS tasks, though I did try to increase the memory size from 16 GB to 32, which didn't make things work. Data of jobs that failed are in the scale of hundreds of megabytes though, if that helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
