[
https://issues.apache.org/jira/browse/ARROW-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-13252:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] CSV Add byte offset for error messages
> --------------------------------------------
>
> Key: ARROW-13252
> URL: https://issues.apache.org/jira/browse/ARROW-13252
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Nate Clark
> Assignee: Nate Clark
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> CSV parsing error messages will contain the row number when parallel parsing
> is not enabled but when parallel parsing is enabled there is no indication of
> where the error occurred in the input. In order to add that context the row
> byte offset can be added to the output.
>
> This can be done relatively easily for the parser but associating byte
> offsets with the data or row being decoded would require more metadata to be
> maintained in the DataBatch. Potentially doubling the size of ParsedValueDesc.
>
> This was mentioned and discussed in comments
> [here|https://github.com/apache/arrow/pull/10202#issuecomment-870796708]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)