[
https://issues.apache.org/jira/browse/ARROW-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265168#comment-17265168
]
Weston Pace commented on ARROW-10183:
-------------------------------------
Latest round of benchmarks (5 iterations on each)
| |2.0.0 (Mean)|2.0.0 (StdDev)|Async (Mean)|Async (StdDev)|Async
(Tasks)|Threaded (Mean)|Threaded (StdDev)|Threaded (Tasks)|
|gzip/cache|6.291222|0.095669|6.467804|0.035468|6229|6.262252|0.056097|4149|
|gzip/none|9.292271|0.251346|9.494446|0.273585|6229|9.22652|0.254951|4149|
|none/cache|1.226155|0.086003|1.245934|0.077262|6229|1.238495|0.073567|4149|
|none/none|34.326746|0.392563|35.091284|0.833403|6222|36.270428|2.033464|4149|
gzip means the source file was compressed with gzip and cache means the source
file was cached in the OS cache. For cache=None the benchmark is high. We
have to make a file copy to ensure we are reading from the disk and this copy
time is included in the benchmark. However, this overhead is consistent.
The 2.0.0 numbers come from using the conda pyarrow (and thus the threaded
table reader).
Results are fairly noisy but I think there is some consistent minor degredation
in the async case. It could be a result of there being a higher number of
tasks, the high number of futures, using submit instead of spawn, or could just
be noise.
> [C++] Create a ForEach library function that runs on an iterator of futures
> ---------------------------------------------------------------------------
>
> Key: ARROW-10183
> URL: https://issues.apache.org/jira/browse/ARROW-10183
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
> Labels: pull-request-available
> Attachments: arrow-continuation-flow.jpg
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> This method should take in an iterator of futures and a callback and pull an
> item off the iterator, "await" it, run the callback on it, and then fetch the
> next item from the iterator.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)