[ 
https://issues.apache.org/jira/browse/ARROW-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461548#comment-17461548
 ] 

Jonathan Keane commented on ARROW-14266:
----------------------------------------

I'm sure it'll end up being more complicated than this, but the first step 
would seem to be replacing the scanner bits inside of {{write_dataset}} 
(https://github.com/apache/arrow/blob/master/r/R/dataset-write.R#L126-L134, and 
then also in {{dataset___Dataset_write}} adding something like 
{{MakeWriteNode}} 
https://github.com/apache/arrow/pull/11017/files#diff-2caf4e9bd3f139e05e55dca80725d8a9c436f5ccf65c76a37cebfa6ee9b36a6aR411-R422
 )

https://github.com/apache/arrow/blob/master/r/R/query-engine.R has a lot of the 
other ExecNode implementation for creating the exec plans and running them and 
https://github.com/apache/arrow/blob/master/r/src/compute-exec.cpp has the C++ 
that links to libarrow

> [R] Use WriteNode to write queries
> ----------------------------------
>
>                 Key: ARROW-14266
>                 URL: https://issues.apache.org/jira/browse/ARROW-14266
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>              Labels: query-engine
>             Fix For: 7.0.0
>
>
> Following ARROW-13542. Any query that has a join or an aggregation currently 
> has to first evaluate the query and hold it in memory before creating a 
> Scanner to write it. We could improve that by using a WriteNode inside 
> write_dataset() (and maybe that improves the other cases too, or at least 
> allows us to delete some code). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to