[
https://issues.apache.org/jira/browse/ARROW-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380147#comment-17380147
]
Weston Pace commented on ARROW-13328:
-------------------------------------
The asynchronous scanner can (and does) read synchronously when it is warranted
(e.g. IPC). So at the moment the asynchronous scanner is as fast or faster in
(I think) all cases but only noticeably faster when you're actually reading
from I/O. If the data is all in memory there isn't much difference.
The synchronous scanner is less complex and was kept around in case there was a
bug detected in the asynchronous scanner it would allow a fallback to the
synchronous scanner to avoid regressions.
I agree with David though that we can, at some point, just get rid of the
synchronous scanner. I think Ben encountered several bugs (which were
addressed) in his use of the async scanner for ARROW-13238 so it's probably not
bug free. On the other hand, we've been using it for a while now and I
wouldn't say we've run into anything major.
> [C++][Dataset] Use an ExecPlan for synchronous scans
> ----------------------------------------------------
>
> Key: ARROW-13328
> URL: https://issues.apache.org/jira/browse/ARROW-13328
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Ben Kietzman
> Priority: Major
> Labels: dataset
>
> ARROW-13238 ensured that asynchronous dataset scans are internally backed by
> ExecPlans, allowing easier integration with new ExecNode sinks.
> However synchronous scans are still backed by {{FilterAndProjectScanTask}}
> and are not consumable using ExecNodes. Ideally, we should route synchronous
> scans through an ExecPlan as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)