[
https://issues.apache.org/jira/browse/ARROW-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Keane updated ARROW-14428:
-----------------------------------
Description:
Right now, I can:
{code}
ds <- open_dataset("some.parquet")
ds %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_dataset(path = "new.parquet")
{code}
but I can't:
{code}
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_parquet("new.parquet")
{code}
In this case, I can cast the column as a separate command and then
{{write_parquet()}} after, but it would be nice to be able to us
`write_parquet()` in a pipeline.
This will require a libarrow addition to / another version of WriteParquet that
takes a RecordBatchReader instead of a fully-instantiated Table
was:
Right now, I can:
{code}
ds <- open_dataset("some.parquet")
ds %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_dataset(path = "new.parquet")
{code}
but I can't:
{code}
tab <- read_parquet("some.parquet", as_data_frame = FALSE)
tab %>%
mutate(
o_orderdate = cast(o_orderdate, date32())
) %>%
write_parquet("new.parquet")
{code}
In this case, I can cast the column as a separate command and then
{{write_parquet()}} after, but it would be nice to be able to us
`write_parquet()` in a pipeline.
> [R] [C++] Allow me to write_parquet() from an arrow_dplyr_query
> ----------------------------------------------------------------
>
> Key: ARROW-14428
> URL: https://issues.apache.org/jira/browse/ARROW-14428
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, R
> Reporter: Jonathan Keane
> Priority: Major
>
> Right now, I can:
> {code}
> ds <- open_dataset("some.parquet")
> ds %>%
> mutate(
> o_orderdate = cast(o_orderdate, date32())
> ) %>%
> write_dataset(path = "new.parquet")
> {code}
> but I can't:
> {code}
> tab <- read_parquet("some.parquet", as_data_frame = FALSE)
> tab %>%
> mutate(
> o_orderdate = cast(o_orderdate, date32())
> ) %>%
> write_parquet("new.parquet")
> {code}
> In this case, I can cast the column as a separate command and then
> {{write_parquet()}} after, but it would be nice to be able to us
> `write_parquet()` in a pipeline.
> This will require a libarrow addition to / another version of WriteParquet
> that takes a RecordBatchReader instead of a fully-instantiated Table
--
This message was sent by Atlassian Jira
(v8.20.1#820001)