zeroshade commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2223213934

   If the scale of the data that is failing is only in the hundreds of 
megabytes, this seems more related to the issues we've had in the past with 
calloc, though it's still weird that using the buffered writing is resulting in 
this issue with the file sizes. Since we shouldn't be getting a failure to 
allocate memory on the scale of hundreds of megabytes if you've got 16GB - 32GB 
of memory on the task available. (Unless that "hundreds of megabytes" is 
referring to the compressed size?)
   
   Out of curiosity, would it be possible for you to have it somehow log the 
sizes of the record batches it is producing in terms of number of cols, number 
of rows, or otherwise estimating the memory size of the batches? Since you are 
unable to share the data itself, anything that could help us try to reproduce 
this would be beneficial.
   
   > `adbc.snowflake.statement.ingest_target_file_size` and 
`adbc.snowflake.statement.ingest_writer_concurrency` finally worked with 
cursor.adbc_statement.set_options() (so there is indeed a mistake in the 
documentation). However, the data is still split into number of concurrent 
writers despite `adbc.snowflake.statement.ingest_target_file_size`.
   
   Do you get more files with 
`adbc.snowflake.statement.ingest_writer_concurrency` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to