Neal Richardson created ARROW-15517:
---------------------------------------

             Summary: [R] Use WriteNode in write_dataset()
                 Key: ARROW-15517
                 URL: https://issues.apache.org/jira/browse/ARROW-15517
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Neal Richardson
            Assignee: Neal Richardson
             Fix For: 8.0.0


Currently, write_dataset uses the Scanner interface, which can't handle 
everything that the ExecPlan does. So if your arrow_dplyr_query contains things 
like aggregations or (more importantly) joins, you have to materialize the 
Table in memory before you can write to disk. The WriteNode added in 
ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, 
so data should be able to stream to disk in more cases, and will benefit from 
future improvements to ExecPlan memory usage and spillover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to