[ 
https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace resolved ARROW-11889.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 10568
[https://github.com/apache/arrow/pull/10568]

> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
>                 Key: ARROW-11889
>                 URL: https://issues.apache.org/jira/browse/ARROW-11889
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently the streaming CSV reader does not allow for much parallelism.  It 
> doesn't allow for reading more than one segment at once (useful in S3) and it 
> doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios 
> although it's possible this is mostly mitigated in cases where there are many 
> more files than cores (as per-file parallelism will occupy all the cores 
> anyways).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to