[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576937#comment-17576937
]
Weston Pace commented on ARROW-17313:
-------------------------------------
That sounds good to me. Even if we end up later unifying everything in a
common "byte ranges" or "percentages" API I don't think there is any harm in
also having format-specific APIs. Plus, having the format-specific APIs should
simplify adoption of a common API if we decide to go that route.
> [C++] Add Byte Range to CSV Reader ReadOptions
> ----------------------------------------------
>
> Key: ARROW-17313
> URL: https://issues.apache.org/jira/browse/ARROW-17313
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Ziheng Wang
> Assignee: Ziheng Wang
> Priority: Major
>
> Sometimes it's desirable to just read a portion of a CSV. The best way to do
> that is to pass in a list of byte ranges to CSV read options that specify
> where in the CSV you want to read. These byte ranges don't necessarily have
> to be aligned on line break boundaries, the CSV reader should just read until
> the end of the line, and skip anything before the first line break in a byte
> range.
> Based on discussion, the scope is going to be reduced here. The first
> implementation will support a single byte range that is already assumed to be
> aligned on byte boundaries.
> Will not handle quotes/returns and other edge cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)