This is an automated email from the ASF dual-hosted git repository. paleolimbot pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push: new 7de273b4ee GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305) 7de273b4ee is described below commit 7de273b4ee05278c02aab84d85d27fd17e532a00 Author: Dewey Dunnington <de...@voltrondata.com> AuthorDate: Tue Jun 27 09:18:55 2023 -0300 GH-35534: [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305) ### Rationale for this change As reported by @ eitsupi, dplyr adds missing grouping variables to the beginning of the variable list; however, we add them to the *end* of the variable list. Our general policy is to match dplyr's behaviour everywhere. Before this PR: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(dplyr, warn.conflicts = FALSE) tibble::tibble(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> Adding missing grouping variables: `chr` #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 arrow_table(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> # A tibble: 4 × 2 #> # Groups: chr [4] #> int chr #> <int> <chr> #> 1 1 a #> 2 2 b #> 3 3 c #> 4 4 d ``` After this PR: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(dplyr, warn.conflicts = FALSE) tibble::tibble(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> Adding missing grouping variables: `chr` #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 arrow_table(int = 1:4, chr = letters[1:4]) |> group_by(chr) |> select(int) |> collect() #> # A tibble: 4 × 2 #> # Groups: chr [4] #> chr int #> <chr> <int> #> 1 a 1 #> 2 b 2 #> 3 c 3 #> 4 d 4 ``` ### Are these changes tested? Yes, a test was added. ### Are there any user-facing changes? Yes, column ordering will be different. This could be a breaking change because existing code that refers to columns by index may change; however, referring to a column by name is much more common. * Closes: #35534 Authored-by: Dewey Dunnington <de...@voltrondata.com> Signed-off-by: Dewey Dunnington <de...@voltrondata.com> --- r/R/dplyr.R | 4 ++-- r/tests/testthat/test-dplyr-select.R | 11 +++++++++++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/r/R/dplyr.R b/r/R/dplyr.R index 62b345e1ce..f11b88d301 100644 --- a/r/R/dplyr.R +++ b/r/R/dplyr.R @@ -316,8 +316,8 @@ ensure_group_vars <- function(x) { if (length(gv)) { # Add them back x$selected_columns <- c( - x$selected_columns, - make_field_refs(gv) + make_field_refs(gv), + x$selected_columns ) } } diff --git a/r/tests/testthat/test-dplyr-select.R b/r/tests/testthat/test-dplyr-select.R index dff73c063b..ba2db175ef 100644 --- a/r/tests/testthat/test-dplyr-select.R +++ b/r/tests/testthat/test-dplyr-select.R @@ -28,6 +28,7 @@ test_that("Empty select returns no columns", { tbl ) }) + test_that("Empty select still includes the group_by columns", { expect_message( compare_dplyr_binding( @@ -38,6 +39,16 @@ test_that("Empty select still includes the group_by columns", { ) }) +test_that("Missing grouping columns are added to the beginning of the list", { + expect_message( + compare_dplyr_binding( + .input %>% group_by(chr) %>% select(int) %>% collect(), + tbl + ), + "Adding missing grouping variables" + ) +}) + test_that("select/rename/rename_with", { compare_dplyr_binding( .input %>%