Joris Van den Bossche created ARROW-18107:
---------------------------------------------
Summary: [C++] Provide more informative error when (CSV/JSON)
parsing fails
Key: ARROW-18107
URL: https://issues.apache.org/jira/browse/ARROW-18107
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
Related to ARROW-18106 (and derived from
https://stackoverflow.com/questions/74138746/why-i-cant-parse-timestamp-in-pyarrow).
Assume you have the following code to read a JSON file with timestamps. The
timestamps have a sub-second part in their string, which fails parsing if you
specify it as second resolution timestamp:
{code:python}
import io
from pyarrow import json
s_json = """{"column":"2022-09-05T08:08:46.000"}"""
opts = json.ParseOptions(explicit_schema=pa.schema([("column",
pa.timestamp("s"))]), unexpected_field_behavior="ignore")
json.read_json(io.BytesIO(s_json.encode()), parse_options=opts)
{code}
gives:
{code}
ArrowInvalid: Failed of conversion of JSON to timestamp[s], couldn't
parse:2022-09-05T08:08:46.000
{code}
This error is expected, but I think it could be more informative about the
reason why it failed parsing (because at first sight it looks like a proper
timestamp string, so you might be left wondering why this is failing).
(this might not be that straightforward, though, since there can be many
reasons why the parsing is failing)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)