Joris Van den Bossche created ARROW-18107:
---------------------------------------------

             Summary: [C++] Provide more informative error when (CSV/JSON) 
parsing fails
                 Key: ARROW-18107
                 URL: https://issues.apache.org/jira/browse/ARROW-18107
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Related to ARROW-18106 (and derived from 
https://stackoverflow.com/questions/74138746/why-i-cant-parse-timestamp-in-pyarrow).
 

Assume you have the following code to read a JSON file with timestamps. The 
timestamps have a sub-second part in their string, which fails parsing if you 
specify it as second resolution timestamp:

{code:python}
import io
from pyarrow import json

s_json = """{"column":"2022-09-05T08:08:46.000"}"""

opts = json.ParseOptions(explicit_schema=pa.schema([("column", 
pa.timestamp("s"))]), unexpected_field_behavior="ignore")
json.read_json(io.BytesIO(s_json.encode()), parse_options=opts)
{code}

gives:

{code}
ArrowInvalid: Failed of conversion of JSON to timestamp[s], couldn't 
parse:2022-09-05T08:08:46.000
{code}

This error is expected, but I think it could be more informative about the 
reason why it failed parsing (because at first sight it looks like a proper 
timestamp string, so you might be left wondering why this is failing). 

(this might not be that straightforward, though, since there can be many 
reasons why the parsing is failing)







--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to