nealrichardson commented on a change in pull request #9716:
URL: https://github.com/apache/arrow/pull/9716#discussion_r595389737



##########
File path: r/R/dplyr.R
##########
@@ -42,7 +42,10 @@ arrow_dplyr_query <- function(.data) {
       filtered_rows = TRUE,
       # group_by_vars is a character vector of columns (as renamed)
       # in the data. They will be kept when data is pulled into R.
-      group_by_vars = character()
+      group_by_vars = character(),
+      # drop_empty_groups is a logical value indicating whether to drop
+      # groups formed by factor levels that don't appear in the data
+      drop_empty_groups = TRUE

Review comment:
       This is pedantic and probably doesn't matter, but logically should this 
be:
   
   ```suggestion
         drop_empty_groups = group_by_drop_default(.data)
   ```

##########
File path: r/R/dplyr.R
##########
@@ -42,7 +42,10 @@ arrow_dplyr_query <- function(.data) {
       filtered_rows = TRUE,
       # group_by_vars is a character vector of columns (as renamed)
       # in the data. They will be kept when data is pulled into R.
-      group_by_vars = character()
+      group_by_vars = character(),
+      # drop_empty_groups is a logical value indicating whether to drop
+      # groups formed by factor levels that don't appear in the data
+      drop_empty_groups = TRUE

Review comment:
       Or should it be `NULL`/unset?

##########
File path: r/R/dplyr.R
##########
@@ -424,11 +427,16 @@ restore_dplyr_features <- function(df, query) {
   if (grouped) {
     # Preserve groupings, if present
     if (is.data.frame(df)) {
-      df <- dplyr::grouped_df(df, dplyr::group_vars(query))
+      df <- dplyr::grouped_df(
+          df,
+          dplyr::group_vars(query),
+          drop = group_by_drop_default(query)
+        )

Review comment:
       nit: i think these lines are indented 2 spaces too much?

##########
File path: r/R/dplyr.R
##########
@@ -424,11 +427,16 @@ restore_dplyr_features <- function(df, query) {
   if (grouped) {

Review comment:
       Should this also check that `!identical(query$group_by_vars, 
group_by_drop_default(df))`? Is it possible to have 0 group by vars but have 
`drop` set?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to