[
https://issues.apache.org/jira/browse/ARROW-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675045#comment-16675045
]
Antoine Pitrou commented on ARROW-3700:
---------------------------------------
Indeed the CSV parser doesn't accept blank lines. Is that a common occurrence?
Should we always accept them, or only optionally? [~wesmckinn]
> read_csv behavior with blank lines differs between CSV deliimters
> -----------------------------------------------------------------
>
> Key: ARROW-3700
> URL: https://issues.apache.org/jira/browse/ARROW-3700
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Ultrabug
> Priority: Major
> Attachments: csv_parse_error.zip
>
>
> This is a copy/paste of the github issue:
> https://github.com/apache/arrow/issues/2883
>
> Hi,
> I was playing with {{pyarrow.csv}} {{read_csv}} and found a rather strange
> behavior that I'm not sure is normal.
> Parsing will fail if the delimiter of the CSV file is a comma and there's a
> blank line after the header (see {{basic_with_blank.csv}} example)
> Example output:
> {{{{Traceback (most recent call last): File "sorrow.py", line 14, in <module>
> table = pa_csv.read_csv(csv) File "pyarrow/_csv.pyx", line 198, in
> pyarrow._csv.read_csv File "pyarrow/error.pxi", line 81, in
> pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: CSV parse error: Expected
> 2 columns, got 1 }}}}
> If I change the CSV delimiter to semicolon, the error disappears and
> everything is fine!
> I'm providing python code and CSV samples which compares with pandas (which
> does not suffer from this).
> Hope this helps, thanks
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)