Zan-L opened a new issue, #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997
### What happened?
Jobs calling adbc_ingestion() failed due to memory error. Upon checking, the
data were split into {number of processor} parquet files, instead of those of
~10MB like 1.0.0.
### Stack Trace
adbc_driver_manager.InternalError: INTERNAL: unknown error type: cannot
allocate memory
cursor.adbc_ingest(table, data, mode)
File
"/usr/local/lib/python3.12/site-packages/adbc_driver_manager/dbapi.py", line
937, in adbc_ingest
return _blocking_call(self._stmt.execute_update, (), {},
self._stmt.cancel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "adbc_driver_manager/_lib.pyx", line 1569, in
adbc_driver_manager._lib._blocking_call_impl
File "adbc_driver_manager/_lib.pyx", line 1562, in
adbc_driver_manager._lib._blocking_call_impl
File "adbc_driver_manager/_lib.pyx", line 1295, in
adbc_driver_manager._lib.AdbcStatement.execute_update
File "adbc_driver_manager/_lib.pyx", line 260, in
adbc_driver_manager._lib.check_error
### How can we reproduce the bug?
Unfortunately, I cannot share the data. However, it should be observed that
in a four core VM, a dataset of moderate size (like 500 MB in parquet file
size) will be split into four ~125MB files when adbc_ingest() is called to
upload to Snowflake instead of fifty ~10MB files.
### Environment/Setup
Packages:
adbc-driver-manager==1.1.0
adbc-driver-snowflake==1.1.0
Operating system: Windows/Linux
Package manager: pip
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]