Jonathan Keane created ARROW-13802:
--------------------------------------

             Summary: [R] accept expression lists in Scanner$create() with 
arrow_dplyr_querys
                 Key: ARROW-13802
                 URL: https://issues.apache.org/jira/browse/ARROW-13802
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
            Reporter: Jonathan Keane


ARROW-13560 enabled projection and filters when using {{Scanner$create()}} on 
{{arrow_dplyr_query}} objects. Projections should also accept (lists of) 
Expressions so that one could add new columns (or rename columns) at this point 
as well. However, to do that we need to do something more complicated to make 
sure that we can select columns that were previously made.

For example, below, {{int_plus}} was created with a mutate earlier, and is in 
the {{proj}} when we call {{Scanner$create()}}, however we cannot use 
{{Expression$field_ref("int_plus")}} (like below) because {{int_plus}} is not a 
field.
{code:r}
ds %>%
    filter(int > 7) %>%
    select(int, dbl, lgl) %>%
    mutate(int_plus = int + 1) %>% 
    Scanner$create(projection = list(
      int = Expression$field_ref("int"),
      int_plus = Expression$field_ref("int_plus"),
      dbl_minus = Expression$create(
        "subtract_checked",
        Expression$field_ref("dbl"),
        Expression$scalar(1)
      )
{code}


One (hacky) way to do this is something like the following which finds all of 
the projection expressions that are field references only and then replaces 
them with the values in {{proj}}.

{code:r}
        only_field_refs <- map_lgl(projection, ~.x$field_name != "")
        field_refs_to_replace <- map_chr(projection[only_field_refs], 
~.x$field_name)
        projection[only_field_refs] <- proj[field_refs_to_replace]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to