Jonathan Keane created ARROW-13802:
--------------------------------------
Summary: [R] accept expression lists in Scanner$create() with
arrow_dplyr_querys
Key: ARROW-13802
URL: https://issues.apache.org/jira/browse/ARROW-13802
Project: Apache Arrow
Issue Type: Bug
Components: R
Reporter: Jonathan Keane
ARROW-13560 enabled projection and filters when using {{Scanner$create()}} on
{{arrow_dplyr_query}} objects. Projections should also accept (lists of)
Expressions so that one could add new columns (or rename columns) at this point
as well. However, to do that we need to do something more complicated to make
sure that we can select columns that were previously made.
For example, below, {{int_plus}} was created with a mutate earlier, and is in
the {{proj}} when we call {{Scanner$create()}}, however we cannot use
{{Expression$field_ref("int_plus")}} (like below) because {{int_plus}} is not a
field.
{code:r}
ds %>%
filter(int > 7) %>%
select(int, dbl, lgl) %>%
mutate(int_plus = int + 1) %>%
Scanner$create(projection = list(
int = Expression$field_ref("int"),
int_plus = Expression$field_ref("int_plus"),
dbl_minus = Expression$create(
"subtract_checked",
Expression$field_ref("dbl"),
Expression$scalar(1)
)
{code}
One (hacky) way to do this is something like the following which finds all of
the projection expressions that are field references only and then replaces
them with the values in {{proj}}.
{code:r}
only_field_refs <- map_lgl(projection, ~.x$field_name != "")
field_refs_to_replace <- map_chr(projection[only_field_refs],
~.x$field_name)
projection[only_field_refs] <- proj[field_refs_to_replace]
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)