[ 
https://issues.apache.org/jira/browse/ARROW-13328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380147#comment-17380147
 ] 

Weston Pace commented on ARROW-13328:
-------------------------------------

The asynchronous scanner can (and does) read synchronously when it is warranted 
(e.g. IPC).  So at the moment the asynchronous scanner is as fast or faster in 
(I think) all cases but only noticeably faster when you're actually reading 
from I/O.  If the data is all in memory there isn't much difference.

The synchronous scanner is less complex and was kept around in case there was a 
bug detected in the asynchronous scanner it would allow a fallback to the 
synchronous scanner to avoid regressions.

I agree with David though that we can, at some point, just get rid of the 
synchronous scanner.  I think Ben encountered several bugs (which were 
addressed) in his use of the async scanner for ARROW-13238 so it's probably not 
bug free.  On the other hand, we've been using it for a while now and I 
wouldn't say we've run into anything major.

> [C++][Dataset] Use an ExecPlan for synchronous scans
> ----------------------------------------------------
>
>                 Key: ARROW-13328
>                 URL: https://issues.apache.org/jira/browse/ARROW-13328
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ben Kietzman
>            Priority: Major
>              Labels: dataset
>
> ARROW-13238 ensured that asynchronous dataset scans are internally backed by 
> ExecPlans, allowing easier integration with new ExecNode sinks.
> However synchronous scans are still backed by {{FilterAndProjectScanTask}} 
> and are not consumable using ExecNodes. Ideally, we should route synchronous 
> scans through an ExecPlan as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to