[
https://issues.apache.org/jira/browse/ARROW-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-15517:
-----------------------------------
Labels: pull-request-available (was: )
> [R] Use WriteNode in write_dataset()
> ------------------------------------
>
> Key: ARROW-15517
> URL: https://issues.apache.org/jira/browse/ARROW-15517
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Neal Richardson
> Assignee: Neal Richardson
> Priority: Major
> Labels: pull-request-available
> Fix For: 8.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, write_dataset uses the Scanner interface, which can't handle
> everything that the ExecPlan does. So if your arrow_dplyr_query contains
> things like aggregations or (more importantly) joins, you have to materialize
> the Table in memory before you can write to disk. The WriteNode added in
> ARROW-13542 is a special sink node that can be put at the end of an ExecPlan,
> so data should be able to stream to disk in more cases, and will benefit from
> future improvements to ExecPlan memory usage and spillover.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)