[
https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308885#comment-17308885
]
Antoine Pitrou commented on ARROW-11889:
----------------------------------------
I'll add that this probably means making {{ColumnDecoder}} async (perhaps
turning it into a generator).
It would be nice if the solution could also tackle ARROW-11853 at the same
time, since both issues will require significant reworking of the
{{ColumnDecoder}} internals anyway.
> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
> Key: ARROW-11889
> URL: https://issues.apache.org/jira/browse/ARROW-11889
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
> Fix For: 5.0.0
>
>
> Currently the streaming CSV reader does not allow for much parallelism. It
> doesn't allow for reading more than one segment at once (useful in S3) and it
> doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios
> although it's possible this is mostly mitigated in cases where there are many
> more files than cores (as per-file parallelism will occupy all the cores
> anyways).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)