[jira] [Updated] (ARROW-14428) [R] [C++] Allow me to write_parquet() from an arrow_dplyr_query

Jonathan Keane (Jira) Wed, 17 Nov 2021 06:17:06 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Keane updated ARROW-14428:
-----------------------------------
    Description: 
Right now, I can:
{code}
ds <- open_dataset("some.parquet")
ds %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_dataset(path = "new.parquet")
{code}

but I can't:
{code}
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_parquet("new.parquet")
{code}

In this case, I can cast the column as a separate command and then 
{{write_parquet()}} after, but it would be nice to be able to us 
`write_parquet()` in a pipeline.

This will require a libarrow addition to / another version of WriteParquet that 
takes a RecordBatchReader instead of a fully-instantiated Table


  was:
Right now, I can:
{code}
ds <- open_dataset("some.parquet")
ds %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_dataset(path = "new.parquet")
{code}

but I can't:
{code}
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>% 
  mutate(
    o_orderdate = cast(o_orderdate, date32())  
  ) %>% 
  write_parquet("new.parquet")
{code}

In this case, I can cast the column as a separate command and then 
{{write_parquet()}} after, but it would be nice to be able to us 
`write_parquet()` in a pipeline.



> [R] [C++] Allow me to write_parquet() from an arrow_dplyr_query 
> ----------------------------------------------------------------
>
>                 Key: ARROW-14428
>                 URL: https://issues.apache.org/jira/browse/ARROW-14428
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, R
>            Reporter: Jonathan Keane
>            Priority: Major
>
> Right now, I can:
> {code}
> ds <- open_dataset("some.parquet")
> ds %>% 
>   mutate(
>     o_orderdate = cast(o_orderdate, date32())  
>   ) %>% 
>   write_dataset(path = "new.parquet")
> {code}
> but I can't:
> {code}
> tab <- read_parquet("some.parquet", as_data_frame = FALSE)
> tab %>% 
>   mutate(
>     o_orderdate = cast(o_orderdate, date32())  
>   ) %>% 
>   write_parquet("new.parquet")
> {code}
> In this case, I can cast the column as a separate command and then 
> {{write_parquet()}} after, but it would be nice to be able to us 
> `write_parquet()` in a pipeline.
> This will require a libarrow addition to / another version of WriteParquet 
> that takes a RecordBatchReader instead of a fully-instantiated Table



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-14428) [R] [C++] Allow me to write_parquet() from an arrow_dplyr_query

Reply via email to