zeroshade commented on issue #1997: URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2221485769
> @zeroshade I can confirm something a bit different than what you asked in 2. but more useful - the same data would be split correctly into parquet files of ~10 MB in ADBC 1.0.0, but only 4 in 1.1.0 in a VM of 4 cores. That should rule out the cause from the data side. Not necessarily. The big change we made between v1.0.0 and v1.1.0 was the switch from calling `Write(rec)` to `WriteBuffered(rec)` when writing record batches to the Parquet files (to work around a bug on Snowflake's side when handling empty row groups in a parquet file). Depending on the size of the row groups we're talking about, the issue could be stat related or otherwise. Though, it would be interesting if we could see where the memory is being used. > It is actually the recommended way to tune using the AdbcStatement but it raises a NotImplemented error at set_options(), and thus contradicting [the doc](https://arrow.apache.org/adbc/current/driver/snowflake.html#bulk-ingestion). This is because it appears that the doc is incorrect (@joellubi we should fix this!) the actual option in the code is `adbc.snowflake.statement.ingest_target_file_size`. Can you try that instead and see if that helps? You could also try using `adbc.snowflake.statement.ingest_writer_concurrency` to change the number of writers (which defaults to the number of CPUs available). > adbc_driver_manager.InternalError: INTERNAL: unknown error type: cannot allocate memory Have you checked what the actual memory usage of the container is when this happens? It might be https://github.com/apache/arrow-adbc/issues/1283 showing up again in a different context if we're using `calloc` somewhere that doesn't do a fallback check to `malloc` + `memset`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
