[ 
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575531#comment-17575531
 ] 

Yibo Cai commented on ARROW-17313:
----------------------------------

I have same concern as Weston. CSV parsing is stateful. AFAIK, figuring out the 
line break has to be done in sequential, if we support "quote" or "escape" or 
custimized delimiter, etc.
Some examples:
- The sample block starts inside a "quoted" field
- The first char of a block is "\n" but the last char of previous block is an 
"escape"
- Sample at middle of "\r\n" may also be confusing

> Add Byte Range to CSV Reader ReadOptions
> ----------------------------------------
>
>                 Key: ARROW-17313
>                 URL: https://issues.apache.org/jira/browse/ARROW-17313
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Ziheng Wang
>            Assignee: Ziheng Wang
>            Priority: Major
>
> Sometimes it's desirable to just read a portion of a CSV. The best way to do 
> that is to pass in a list of byte ranges to CSV read options that specify 
> where in the CSV you want to read. These byte ranges don't necessarily have 
> to be aligned on line break boundaries, the CSV reader should just read until 
> the end of the line, and skip anything before the first line break in a byte 
> range.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to