[jira] [Updated] (ARROW-15517) [R] Use WriteNode in write_dataset()

ASF GitHub Bot (Jira) Tue, 01 Feb 2022 13:22:03 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated ARROW-15517:
-----------------------------------
    Labels: pull-request-available  (was: )

> [R] Use WriteNode in write_dataset()
> ------------------------------------
>
>                 Key: ARROW-15517
>                 URL: https://issues.apache.org/jira/browse/ARROW-15517
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, write_dataset uses the Scanner interface, which can't handle 
> everything that the ExecPlan does. So if your arrow_dplyr_query contains 
> things like aggregations or (more importantly) joins, you have to materialize 
> the Table in memory before you can write to disk. The WriteNode added in 
> ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, 
> so data should be able to stream to disk in more cases, and will benefit from 
> future improvements to ExecPlan memory usage and spillover.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-15517) [R] Use WriteNode in write_dataset()

Reply via email to