paleolimbot opened a new pull request, #36305:
URL: https://github.com/apache/arrow/pull/36305

   ### Rationale for this change
   
   As reported by @eitsupi, dplyr adds missing grouping variables to the 
beginning of the variable list; however, we add them to the *end* of the 
variable list. Our general policy is to match dplyr's behaviour everywhere.
   
   Before this PR:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> Adding missing grouping variables: `chr`
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   
   arrow_table(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>     int chr  
   #>   <int> <chr>
   #> 1     1 a    
   #> 2     2 b    
   #> 3     3 c    
   #> 4     4 d
   ```
   
   After this PR:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   
   tibble::tibble(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> Adding missing grouping variables: `chr`
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   
   arrow_table(int = 1:4, chr = letters[1:4]) |> 
     group_by(chr) |> 
     select(int) |> 
     collect()
   #> # A tibble: 4 × 2
   #> # Groups:   chr [4]
   #>   chr     int
   #>   <chr> <int>
   #> 1 a         1
   #> 2 b         2
   #> 3 c         3
   #> 4 d         4
   ```
   
   ### Are these changes tested?
   
   Yes, a test was added.
   
   ### Are there any user-facing changes?
   
   Yes, column ordering will be different. This could be a breaking change 
because existing code that refers to columns by index may change; however, 
referring to a column by name is much more common.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to