Frederik Fabritius created ARROW-16872:
------------------------------------------
Summary: open_csv throws ArrowInvalid if csv does not end with a
new line and is above 16384 lines
Key: ARROW-16872
URL: https://issues.apache.org/jira/browse/ARROW-16872
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 8.0.0, 7.0.0
Reporter: Frederik Fabritius
`pyarrow.csv.open_csv` throws ArrowInvalid if csv does not end with a new line
and is above 16384 lines. Tested with both pyarrow 7.0.0 and 8.0.0. Error seen
both in production app and on developer laptop.
Here's a minimal case for reproducing the issue:
```python
import pyarrow as pa
import pyarrow.csv
from io import BytesIO
for _ in pa.csv.open_csv(BytesIO('\n'.join(['review_id,filter_outcome'] +
['62593aaec7628b203bad4c6e,fabrication']*16385).encode())): pass
```
Error is thrown:
ArrowInvalid: CSV parse error: Expected 2 columns, got 1:
--
This message was sent by Atlassian Jira
(v8.20.7#820007)