[ 
https://issues.apache.org/jira/browse/ARROW-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13252:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] CSV Add byte offset for error messages
> --------------------------------------------
>
>                 Key: ARROW-13252
>                 URL: https://issues.apache.org/jira/browse/ARROW-13252
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Nate Clark
>            Assignee: Nate Clark
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> CSV parsing error messages will contain the row number when parallel parsing 
> is not enabled but when parallel parsing is enabled there is no indication of 
> where the error occurred in the input. In order to add that context the row 
> byte offset can be added to the output.
>  
> This can be done relatively easily for the parser but associating byte 
> offsets with the data or row being decoded would require more metadata to be 
> maintained in the DataBatch. Potentially doubling the size of ParsedValueDesc.
>  
> This was mentioned and discussed in comments 
> [here|https://github.com/apache/arrow/pull/10202#issuecomment-870796708]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to