Weston Pace created ARROW-11889:
-----------------------------------

             Summary: [C++] Add parallelism to streaming CSV reader
                 Key: ARROW-11889
                 URL: https://issues.apache.org/jira/browse/ARROW-11889
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Currently the streaming CSV reader does not allow for much parallelism.  It 
doesn't allow for reading more than one segment at once (useful in S3) and it 
doesn't allow for column fan-out for parsing & converting.

It seems both of these options would speed up CSV reading in some scenarios 
although it's possible this is mostly mitigated in cases where there are many 
more files than cores (as per-file parallelism will occupy all the cores 
anyways).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to