Nate Clark created ARROW-13252:
----------------------------------

             Summary: [C++] CSV Add byte offset for error messages
                 Key: ARROW-13252
                 URL: https://issues.apache.org/jira/browse/ARROW-13252
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Nate Clark
            Assignee: Nate Clark


CSV parsing error messages will contain the row number when parallel parsing is 
not enabled but when parallel parsing is enabled there is no indication of 
where the error occurred in the input. In order to add that context the row 
byte offset can be added to the output.

 

This can be done relatively easily for the parser but associating byte offsets 
with the data or row being decoded would require more metadata to be maintained 
in the DataBatch. Potentially doubling the size of ParsedValueDesc.

 

This was mentioned and discussed in comments 
[here|https://github.com/apache/arrow/pull/10202#issuecomment-870796708]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to