zeroshade commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2221485769

   > @zeroshade I can confirm something a bit different than what you asked in 
2. but more useful - the same data would be split correctly into parquet files 
of ~10 MB in ADBC 1.0.0, but only 4 in 1.1.0 in a VM of 4 cores. That should 
rule out the cause from the data side.
   
   Not necessarily. The big change we made between v1.0.0 and v1.1.0 was the 
switch from calling `Write(rec)` to `WriteBuffered(rec)` when writing record 
batches to the Parquet files (to work around a bug on Snowflake's side when 
handling empty row groups in a parquet file). Depending on the size of the row 
groups we're talking about, the issue could be stat related or otherwise. 
Though, it would be interesting if we could see where the memory is being used.
   
   > It is actually the recommended way to tune using the AdbcStatement but it 
raises a NotImplemented error at set_options(), and thus contradicting [the 
doc](https://arrow.apache.org/adbc/current/driver/snowflake.html#bulk-ingestion).
   
   This is because it appears that the doc is incorrect (@joellubi we should 
fix this!) the actual option in the code is 
`adbc.snowflake.statement.ingest_target_file_size`. Can you try that instead 
and see if that helps? You could also try using 
`adbc.snowflake.statement.ingest_writer_concurrency` to change the number of 
writers (which defaults to the number of CPUs available).
   
   > adbc_driver_manager.InternalError: INTERNAL: unknown error type: cannot 
allocate memory
   
   Have you checked what the actual memory usage of the container is when this 
happens? It might be https://github.com/apache/arrow-adbc/issues/1283 showing 
up again in a different context if we're using `calloc` somewhere that doesn't 
do a fallback check to `malloc` + `memset`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to