[
https://issues.apache.org/jira/browse/ARROW-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-12386:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] Support file parallelism in AsyncScanner
> ----------------------------------------------
>
> Key: ARROW-12386
> URL: https://issues.apache.org/jira/browse/ARROW-12386
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Whether we pull from files in parallel or not is controlled by how we merge
> the batch streams in `AsyncScanner::ScanBatchesUnorderedAsync`. Currently we
> are relying on `MakeConcatenatedGenerator` which is incorrect. This is
> needed because `MakeMergedGenerator` pulls from its source (an
> `EnumeratingGenerator`) in an async reentrant fashion. `MakeMergedGenerator`
> should not do this. If some kind of readahead is truly necessary there then
> use `MakeReadaheadGenerator`.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)