Frederik Fabritius created ARROW-16872:
------------------------------------------

             Summary: open_csv throws ArrowInvalid if csv does not end with a 
new line and is above 16384 lines
                 Key: ARROW-16872
                 URL: https://issues.apache.org/jira/browse/ARROW-16872
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 8.0.0, 7.0.0
            Reporter: Frederik Fabritius


`pyarrow.csv.open_csv` throws ArrowInvalid if csv does not end with a new line 
and is above 16384 lines. Tested with both pyarrow 7.0.0 and 8.0.0. Error seen 
both in production app and on developer laptop.

 

Here's a minimal case for reproducing the issue:

```python

import pyarrow as pa

import pyarrow.csv

from io import BytesIO

for _ in pa.csv.open_csv(BytesIO('\n'.join(['review_id,filter_outcome'] + 
['62593aaec7628b203bad4c6e,fabrication']*16385).encode())): pass

```

 

Error is thrown: 

ArrowInvalid: CSV parse error: Expected 2 columns, got 1: 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to