mwinters0 opened a new issue, #3611: URL: https://github.com/apache/arrow-adbc/issues/3611
### What happened? When selecting a specific 30 second time range from my database, I get this pyarrow error from `cursor.fetch_arrow_table()`. However, if I bisect this time range and query the first 15 seconds and the latter 15 seconds separately, they each work. Additionally, if I export the full 30 second range to CSV with `psql` I can import it with `pyarrow.csv.read_csv()` without issue. I've examined the time range in duckdb and can't find anything unusual. The rows have many near-duplicates because they are the result of several LEFT JOINs, which is intended. Zstd was able to compress the 3.0Gb `psql` exported time ranges down to 1.3Mb (!!) so I've attached them here. I had to add gzip because Github doesn't accept .zst files. ``` gunzip bad.csv.zst.gz && zstd -d bad.csv.zst ``` [bad.csv.zst.gz](https://github.com/user-attachments/files/23078531/bad.csv.zst.gz) [bad.text.zst.gz](https://github.com/user-attachments/files/23080501/bad.text.zst.gz) [bad.bin.zst.gz](https://github.com/user-attachments/files/23080527/bad.bin.zst.gz) ### Stack Trace ``` [...] File "/mnt/ssd/fedora/nomaste/nomaste/workflow/db.py", line 272, in _generate_normalized_time_chunks raw_chunk_table = fetch_raw_time_chunk( params.conn, chunk_start_date, chunk_end_date, params.topic ) File "/mnt/ssd/fedora/nomaste/nomaste/workflow/db.py", line 248, in fetch_raw_time_chunk t = cur.fetch_arrow_table() File "/mnt/ssd/fedora/nomaste/.venv/lib/python3.13/site-packages/adbc_driver_manager/dbapi.py", line 1179, in fetch_arrow_table return self._results.fetch_arrow_table() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ File "/mnt/ssd/fedora/nomaste/.venv/lib/python3.13/site-packages/adbc_driver_manager/dbapi.py", line 1346, in fetch_arrow_table return _blocking_call(self.reader.read_all, (), {}, self._stmt.cancel) File "adbc_driver_manager/_lib.pyx", line 1749, in adbc_driver_manager._lib._blocking_call_impl File "adbc_driver_manager/_lib.pyx", line 1742, in adbc_driver_manager._lib._blocking_call_impl File "adbc_driver_manager/_reader.pyx", line 91, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all File "adbc_driver_manager/_reader.pyx", line 43, in adbc_driver_manager._reader._AdbcErrorHelper.check_error File "adbc_driver_manager/_reader.pyx", line 89, in adbc_driver_manager._reader.AdbcRecordBatchReader.read_all File "pyarrow/ipc.pxi", line 794, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Expected last offset >= 0 but found -1976420846 ``` ### How can we reproduce the bug? _No response_ ### Environment/Setup - Python 3.13.7 on x86_64. I've tried both the `uv` managed version and the system versions from Fedora and Arch. - Postgres is running docker tag `timescale/timescaledb-ha:pg13.22-ts2.15.3-oss` from [this Dockerfile](https://github.com/timescale/timescaledb-docker-ha/blob/master/Dockerfile) ```console % uv tree Resolved 14 packages in 0.50ms bus2parq v0.1.0 ├── adbc-driver-postgresql v1.8.0 │ ├── adbc-driver-manager v1.8.0 │ │ └── typing-extensions v4.15.0 │ └── importlib-resources v6.5.2 ├── backports-zstd v0.5.0 ├── click v8.3.0 ├── pyarrow v21.0.0 └── pytest v8.4.2 (group: dev) ├── iniconfig v2.1.0 ├── packaging v25.0 ├── pluggy v1.6.0 └── pygments v2.19.2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
