paleolimbot opened a new pull request, #36305:
URL: https://github.com/apache/arrow/pull/36305
### Rationale for this change
As reported by @eitsupi, dplyr adds missing grouping variables to the
beginning of the variable list; however, we add them to the *end* of the
variable list. Our general policy is to match dplyr's behaviour everywhere.
Before this PR:
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()`
for more information.
library(dplyr, warn.conflicts = FALSE)
tibble::tibble(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
arrow_table(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> int chr
#> <int> <chr>
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
```
After this PR:
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()`
for more information.
library(dplyr, warn.conflicts = FALSE)
tibble::tibble(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> Adding missing grouping variables: `chr`
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
arrow_table(int = 1:4, chr = letters[1:4]) |>
group_by(chr) |>
select(int) |>
collect()
#> # A tibble: 4 × 2
#> # Groups: chr [4]
#> chr int
#> <chr> <int>
#> 1 a 1
#> 2 b 2
#> 3 c 3
#> 4 d 4
```
### Are these changes tested?
Yes, a test was added.
### Are there any user-facing changes?
Yes, column ordering will be different. This could be a breaking change
because existing code that refers to columns by index may change; however,
referring to a column by name is much more common.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]