[ https://issues.apache.org/jira/browse/ARROW-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235624#comment-17235624 ]
Wes McKinney commented on ARROW-10579: -------------------------------------- If you want to share the file privately with one of us, you can determine our e-mail addresses from the mailing lists or {{git log}}. > [Python] read_csv from a large file with long string columns failed to parse > the input correctly > ------------------------------------------------------------------------------------------------ > > Key: ARROW-10579 > URL: https://issues.apache.org/jira/browse/ARROW-10579 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 2.0.0 > Environment: python 3.8 pyarrow 2.0.0 > Reporter: ran > Priority: Major > > import pyarrow.csv as pac # PyArrow is installed with `datasets` > read_options = pac.ReadOptions(block_size=1e9) > parse_options = pac.ParseOptions(delimiter=',') > table = csv.read_csv('0.csv', > read_options=read_options,parse_options=parse_options) > I get the following error: > ~/anaconda2/envs/nlp/lib/python3.8/site-packages/pyarrow/_csv.pyx in > pyarrow._csv.read_csv() > ~/anaconda2/envs/nlp/lib/python3.8/site-packages/pyarrow/error.pxi in > pyarrow.lib.pyarrow_internal_check_status() > ~/anaconda2/envs/nlp/lib/python3.8/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() ArrowInvalid: CSV parse error: Expected 5 columns, > got 187 > I validate that the CSV is good and I am able to read it using Pandas and > write+read it to hdf5. > Thanks for the help :) -- This message was sent by Atlassian Jira (v8.3.4#803005)