Weston Pace created ARROW-11889:
-----------------------------------
Summary: [C++] Add parallelism to streaming CSV reader
Key: ARROW-11889
URL: https://issues.apache.org/jira/browse/ARROW-11889
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Currently the streaming CSV reader does not allow for much parallelism. It
doesn't allow for reading more than one segment at once (useful in S3) and it
doesn't allow for column fan-out for parsing & converting.
It seems both of these options would speed up CSV reading in some scenarios
although it's possible this is mostly mitigated in cases where there are many
more files than cores (as per-file parallelism will occupy all the cores
anyways).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)