[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578080#comment-17578080 ]
Weston Pace commented on ARROW-17313: ------------------------------------- There is also ARROW-15589, which I had referenced above, also motivated from Spark (but via Substrait). > [C++] Add Byte Range to CSV Reader ReadOptions > ---------------------------------------------- > > Key: ARROW-17313 > URL: https://issues.apache.org/jira/browse/ARROW-17313 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Reporter: Ziheng Wang > Assignee: Ziheng Wang > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Sometimes it's desirable to just read a portion of a CSV. The best way to do > that is to pass in a list of byte ranges to CSV read options that specify > where in the CSV you want to read. These byte ranges don't necessarily have > to be aligned on line break boundaries, the CSV reader should just read until > the end of the line, and skip anything before the first line break in a byte > range. > Based on discussion, the scope is going to be reduced here. The first > implementation will support a single byte range that is already assumed to be > aligned on byte boundaries. > Will not handle quotes/returns and other edge cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)