jhwang7628 opened a new issue, #38577:
URL: https://github.com/apache/arrow/issues/38577
### Describe the bug, including details regarding any error messages,
version, and platform.
Hi,
We have a parquet that used to read fine in 13.0.0, but now I got an error
when calling via `pandas.read_parquet` using 14.0.0. The relevant error is:
```
File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py",
line 3003, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
File "/opt/venv/lib/python3.10/site-packages/pyarrow/parquet/core.py",
line 2631, in read
table = self._dataset.to_table(
File "pyarrow/_dataset.pyx", line 556, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3713, in
pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 154, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowCapacityError: array cannot contain more than 2147483646
bytes, have 2148480400
```
Is this an intended behavior? I skimmed through the changelog but did not
find this. Thanks.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]