[
https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace resolved ARROW-11889.
---------------------------------
Resolution: Fixed
Issue resolved by pull request 10568
[https://github.com/apache/arrow/pull/10568]
> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
> Key: ARROW-11889
> URL: https://issues.apache.org/jira/browse/ARROW-11889
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Fix For: 5.0.0
>
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> Currently the streaming CSV reader does not allow for much parallelism. It
> doesn't allow for reading more than one segment at once (useful in S3) and it
> doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios
> although it's possible this is mostly mitigated in cases where there are many
> more files than cores (as per-file parallelism will occupy all the cores
> anyways).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)