[
https://issues.apache.org/jira/browse/ARROW-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479636#comment-17479636
]
Dewey Dunnington commented on ARROW-15271:
------------------------------------------
Just collecting a few related code comments here:
-
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L89
-
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/query-engine.R#L23-L26
-
https://github.com/apache/arrow/blob/03219e21b42f17294fba3b3d2b22a9117fe0f080/r/R/dataset-scan.R#L184
Related is the ability to write files directly in a query plan using the
{{WriteNode}} that was added in ARROW-13542. For example, there is a ticket
open for using the {{WriteNode}} to write data sets (ARROW-14266). Writing
files is useful but perhaps orthogonal to the ability to iterate over a
{{RecordBatchReader}}, which is exemplified by the revamped {{map_batches()}} +
vignette addition.
> [R] Refactor do_exec_plan to return a RecordBatchReader
> -------------------------------------------------------
>
> Key: ARROW-15271
> URL: https://issues.apache.org/jira/browse/ARROW-15271
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 6.0.1
> Reporter: Will Jones
> Priority: Major
>
> Right now
> [{{do_exec_plan}}|https://github.com/apache/arrow/blob/master/r/R/query-engine.R#L18]
> returns an Arrow table because {{head}}, {{tail}}, and {{arrange}} do. If
> ARROW-14289 is completed and similar work is done for {{arrange}}, we may be
> able to alter {{do_exec_plan}} to return a RBR instead.
> The {{map_batches()}} implementation (ARROW-14029) could benefit from this
> refactor. And it might make ARROW-15040 more useful.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)