[ https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932511#comment-16932511 ]
Antoine Pitrou commented on ARROW-6003: --------------------------------------- Ok, so I get this: {code:python} >>> b = b"""5.1,3.5,1.4,0.2,"setosa" ...: 4.9,3,1.4,0.2,"setosa" ...: """ >>> csv.read_csv(io.BytesIO(b)) >>> >>> pyarrow.Table 5.1: double 3.5: int64 1.4: double 0.2: double setosa: string >>> csv.read_csv(io.BytesIO(b), read_options=csv.ReadOptions(column_names=['a', >>> 'b'])) >>> Traceback (most recent call last): File "<ipython-input-7-a2e61d90e816>", line 1, in <module> csv.read_csv(io.BytesIO(b), read_options=csv.ReadOptions(column_names=['a', 'b'])) File "pyarrow/_csv.pyx", line 541, in pyarrow._csv.read_csv check_status(reader.get().Read(&table)) File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status raise ArrowInvalid(message) ArrowInvalid: CSV parse error: Expected 2 columns, got 5 {code} > [C++] Better input validation and error messaging in CSV reader > --------------------------------------------------------------- > > Key: ARROW-6003 > URL: https://issues.apache.org/jira/browse/ARROW-6003 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Neal Richardson > Assignee: Neal Richardson > Priority: Major > Labels: csv > > Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error > message(s) are not great when you give bad input. For example, if I give too > many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV > file}}. In fact, that's about the only error message I've seen from the CSV > reader, no matter what I've thrown at it. > It would be better if error messages were more specific so that I as a user > might know how to fix my bad input. -- This message was sent by Atlassian Jira (v8.3.4#803005)