zeroshade commented on issue #1997: URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2223213934
If the scale of the data that is failing is only in the hundreds of megabytes, this seems more related to the issues we've had in the past with calloc, though it's still weird that using the buffered writing is resulting in this issue with the file sizes. Since we shouldn't be getting a failure to allocate memory on the scale of hundreds of megabytes if you've got 16GB - 32GB of memory on the task available. (Unless that "hundreds of megabytes" is referring to the compressed size?) Out of curiosity, would it be possible for you to have it somehow log the sizes of the record batches it is producing in terms of number of cols, number of rows, or otherwise estimating the memory size of the batches? Since you are unable to share the data itself, anything that could help us try to reproduce this would be beneficial. > `adbc.snowflake.statement.ingest_target_file_size` and `adbc.snowflake.statement.ingest_writer_concurrency` finally worked with cursor.adbc_statement.set_options() (so there is indeed a mistake in the documentation). However, the data is still split into number of concurrent writers despite `adbc.snowflake.statement.ingest_target_file_size`. Do you get more files with `adbc.snowflake.statement.ingest_writer_concurrency` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
