[
https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Keane updated ARROW-13434:
-----------------------------------
Description:
With dplyr, when we group_by with an unnamed expression, a column is added to
the dataframe that has the result of the expression.
{code}
> example_data %>%
+ group_by(int < 4) %>% collect()
# A tibble: 10 x 8
# Groups: int < 4 [3]
int dbl dbl2 lgl false chr fct `int < 4`
<int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>
1 1 1.1 5 TRUE FALSE a a TRUE
2 2 2.1 5 NA FALSE b b TRUE
3 3 3.1 5 TRUE FALSE c c TRUE
4 NA 4.1 5 FALSE FALSE d d NA
5 5 5.1 5 TRUE FALSE e NA FALSE
6 6 6.1 5 NA FALSE NA NA FALSE
7 7 7.1 5 NA FALSE g g FALSE
8 8 8.1 5 FALSE FALSE h h FALSE
9 9 NA 5 FALSE FALSE i i FALSE
10 10 10.1 5 NA FALSE j j FALSE
{code}
Arrow doesn't do this, however because we (currently) only add columns when the
expression is named.
{code}
> Table$create(example_data) %>%
+ group_by(int < 4) %>% collect()
Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
dbl: double
dbl2: double
lgl: bool
false: bool
chr: string
fct: dictionary<values=string, indices=int8, ordered=0>
{code}
This isn't a big deal right now since grouped aggregations aren't (quite) here
yet, but once we start having support for that, we will have people using
examples like this.
was:
With dplyr, when we group_by with an expression, a column is added to the
dataframe that has the result of the expression.
{code}
> example_data %>%
+ group_by(int < 4) %>% collect()
# A tibble: 10 x 8
# Groups: int < 4 [3]
int dbl dbl2 lgl false chr fct `int < 4`
<int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>
1 1 1.1 5 TRUE FALSE a a TRUE
2 2 2.1 5 NA FALSE b b TRUE
3 3 3.1 5 TRUE FALSE c c TRUE
4 NA 4.1 5 FALSE FALSE d d NA
5 5 5.1 5 TRUE FALSE e NA FALSE
6 6 6.1 5 NA FALSE NA NA FALSE
7 7 7.1 5 NA FALSE g g FALSE
8 8 8.1 5 FALSE FALSE h h FALSE
9 9 NA 5 FALSE FALSE i i FALSE
10 10 10.1 5 NA FALSE j j FALSE
{code}
Arrow doesn't do this, however:
{code}
> Table$create(example_data) %>%
+ group_by(int < 4) %>% collect()
Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
dbl: double
dbl2: double
lgl: bool
false: bool
chr: string
fct: dictionary<values=string, indices=int8, ordered=0>
{code}
This isn't a big deal right now since grouped aggregations aren't (quite) here
yet, but once we start having support for that, we will have people using
examples like this. This might actually be something we need/want to do in C++
instead of in the R client.
The workaround is relatively simple: add the expression in a mutate, then
group_by that.
> [R] group_by() with an unnammed expression
> ------------------------------------------
>
> Key: ARROW-13434
> URL: https://issues.apache.org/jira/browse/ARROW-13434
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Jonathan Keane
> Priority: Major
>
> With dplyr, when we group_by with an unnamed expression, a column is added to
> the dataframe that has the result of the expression.
> {code}
> > example_data %>%
> + group_by(int < 4) %>% collect()
> # A tibble: 10 x 8
> # Groups: int < 4 [3]
> int dbl dbl2 lgl false chr fct `int < 4`
> <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>
> 1 1 1.1 5 TRUE FALSE a a TRUE
> 2 2 2.1 5 NA FALSE b b TRUE
> 3 3 3.1 5 TRUE FALSE c c TRUE
> 4 NA 4.1 5 FALSE FALSE d d NA
> 5 5 5.1 5 TRUE FALSE e NA FALSE
> 6 6 6.1 5 NA FALSE NA NA FALSE
> 7 7 7.1 5 NA FALSE g g FALSE
> 8 8 8.1 5 FALSE FALSE h h FALSE
> 9 9 NA 5 FALSE FALSE i i FALSE
> 10 10 10.1 5 NA FALSE j j FALSE
> {code}
> Arrow doesn't do this, however because we (currently) only add columns when
> the expression is named.
> {code}
> > Table$create(example_data) %>%
> + group_by(int < 4) %>% collect()
> Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
> dbl: double
> dbl2: double
> lgl: bool
> false: bool
> chr: string
> fct: dictionary<values=string, indices=int8, ordered=0>
> {code}
> This isn't a big deal right now since grouped aggregations aren't (quite)
> here yet, but once we start having support for that, we will have people
> using examples like this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)