[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ziheng Wang updated ARROW-17313:
--------------------------------
Description:
Sometimes it's desirable to just read a portion of a CSV. The best way to do
that is to pass in a list of byte ranges to CSV read options that specify where
in the CSV you want to read. These byte ranges don't necessarily have to be
aligned on line break boundaries, the CSV reader should just read until the end
of the line, and skip anything before the first line break in a byte range.
Based on discussion, the scope is going to be reduced here. The first
implementation will support a single byte range that is already assumed to be
aligned on byte boundaries.
was:Sometimes it's desirable to just read a portion of a CSV. The best way to
do that is to pass in a list of byte ranges to CSV read options that specify
where in the CSV you want to read. These byte ranges don't necessarily have to
be aligned on line break boundaries, the CSV reader should just read until the
end of the line, and skip anything before the first line break in a byte range.
> [C++] Add Byte Range to CSV Reader ReadOptions
> ----------------------------------------------
>
> Key: ARROW-17313
> URL: https://issues.apache.org/jira/browse/ARROW-17313
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Ziheng Wang
> Assignee: Ziheng Wang
> Priority: Major
>
> Sometimes it's desirable to just read a portion of a CSV. The best way to do
> that is to pass in a list of byte ranges to CSV read options that specify
> where in the CSV you want to read. These byte ranges don't necessarily have
> to be aligned on line break boundaries, the CSV reader should just read until
> the end of the line, and skip anything before the first line break in a byte
> range.
> Based on discussion, the scope is going to be reduced here. The first
> implementation will support a single byte range that is already assumed to be
> aligned on byte boundaries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)