[
https://issues.apache.org/jira/browse/ARROW-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461548#comment-17461548
]
Jonathan Keane commented on ARROW-14266:
----------------------------------------
I'm sure it'll end up being more complicated than this, but the first step
would seem to be replacing the scanner bits inside of {{write_dataset}}
(https://github.com/apache/arrow/blob/master/r/R/dataset-write.R#L126-L134, and
then also in {{dataset___Dataset_write}} adding something like
{{MakeWriteNode}}
https://github.com/apache/arrow/pull/11017/files#diff-2caf4e9bd3f139e05e55dca80725d8a9c436f5ccf65c76a37cebfa6ee9b36a6aR411-R422
)
https://github.com/apache/arrow/blob/master/r/R/query-engine.R has a lot of the
other ExecNode implementation for creating the exec plans and running them and
https://github.com/apache/arrow/blob/master/r/src/compute-exec.cpp has the C++
that links to libarrow
> [R] Use WriteNode to write queries
> ----------------------------------
>
> Key: ARROW-14266
> URL: https://issues.apache.org/jira/browse/ARROW-14266
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Priority: Major
> Labels: query-engine
> Fix For: 7.0.0
>
>
> Following ARROW-13542. Any query that has a join or an aggregation currently
> has to first evaluate the query and hold it in memory before creating a
> Scanner to write it. We could improve that by using a WriteNode inside
> write_dataset() (and maybe that improves the other cases too, or at least
> allows us to delete some code).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)