[GitHub] [arrow] nealrichardson commented on a change in pull request #10785: ARROW-13434: [R] group_by() with an unnammed expression

GitBox Fri, 23 Jul 2021 07:20:05 -0700


nealrichardson commented on a change in pull request #10785:
URL: https://github.com/apache/arrow/pull/10785#discussion_r675596476




##########
File path: r/R/dplyr-group-by.R
##########
@@ -18,16 +18,33 @@
 
 # The following S3 methods are registered on load if dplyr is present
 
+#' @importFrom rlang quo_name
 group_by.arrow_dplyr_query <- function(.data,
                                        ...,
                                        .add = FALSE,
                                        add = .add,
                                        .drop = 
dplyr::group_by_drop_default(.data)) {
   .data <- arrow_dplyr_query(.data)
-  # ... can contain expressions (i.e. can add (or rename?) columns)
-  # Check for those (they show up as named expressions)
   new_groups <- enquos(...)
-  new_groups <- new_groups[nzchar(names(new_groups))]
+  # ... can contain expressions (i.e. can add (or rename?) columns) and so we
+  # need to identify those and add them on to the query with mutate. 
Specifically,
+  # we want to mark as new:
+  #   * expressions (named or otherwise)
+  #   * variables that have new names
+  # All others (i.e. simple references to variables) should not be (re)-added
+  new_group_ind <- purrr::map_lgl(new_groups, ~!(quo_name(.x) %in% 
names(.data)))

Review comment:
       No need to `purrr::` these, we import them

##########
File path: r/R/dplyr-group-by.R
##########
@@ -18,16 +18,33 @@
 
 # The following S3 methods are registered on load if dplyr is present
 
+#' @importFrom rlang quo_name

Review comment:
       To your points Jon, IMO any function from a package we Import should be 
in `importsFrom` and be used without `::`, and we only use `::` for Suggested 
packages.

##########
File path: r/R/dplyr-group-by.R
##########
@@ -18,16 +18,33 @@
 
 # The following S3 methods are registered on load if dplyr is present
 
+#' @importFrom rlang quo_name
 group_by.arrow_dplyr_query <- function(.data,
                                        ...,
                                        .add = FALSE,
                                        add = .add,
                                        .drop = 
dplyr::group_by_drop_default(.data)) {
   .data <- arrow_dplyr_query(.data)
-  # ... can contain expressions (i.e. can add (or rename?) columns)
-  # Check for those (they show up as named expressions)
   new_groups <- enquos(...)
-  new_groups <- new_groups[nzchar(names(new_groups))]
+  # ... can contain expressions (i.e. can add (or rename?) columns) and so we
+  # need to identify those and add them on to the query with mutate. 
Specifically,
+  # we want to mark as new:
+  #   * expressions (named or otherwise)
+  #   * variables that have new names
+  # All others (i.e. simple references to variables) should not be (re)-added
+  new_group_ind <- purrr::map_lgl(new_groups, ~!(quo_name(.x) %in% 
names(.data)))
+  named_group_ind <- purrr::map_lgl(names(new_groups), nzchar)
+  new_groups <- new_groups[new_group_ind | named_group_ind]
+
+  # now either use the name that was given in ... or if that is "" then use 
the expr
+  names(new_groups) <- purrr::map_chr(seq_along(new_groups), function(i) {
+    name <- names(new_groups)[[i]]
+    if (name == "") {
+      quo_name(new_groups[[i]])
+    } else {
+      name
+    }
+  })

Review comment:
       I figured there must be a `purrr` function do to this (the equivalent of 
python's `enumerate()`) and [it turns out there 
is](https://purrr.tidyverse.org/reference/imap.html). I think it would look 
like this:
   
   ```suggestion
     names(new_groups) <- imap_chr(new_groups, ~ ifelse(.y == "", quo_name(.x), 
.y))
   ```
   
   (Also need to add `imap_chr` to `@importFrom`)

##########
File path: r/R/dplyr-group-by.R
##########
@@ -18,16 +18,33 @@
 
 # The following S3 methods are registered on load if dplyr is present
 
+#' @importFrom rlang quo_name
 group_by.arrow_dplyr_query <- function(.data,
                                        ...,
                                        .add = FALSE,
                                        add = .add,
                                        .drop = 
dplyr::group_by_drop_default(.data)) {
   .data <- arrow_dplyr_query(.data)
-  # ... can contain expressions (i.e. can add (or rename?) columns)
-  # Check for those (they show up as named expressions)
   new_groups <- enquos(...)
-  new_groups <- new_groups[nzchar(names(new_groups))]
+  # ... can contain expressions (i.e. can add (or rename?) columns) and so we
+  # need to identify those and add them on to the query with mutate. 
Specifically,
+  # we want to mark as new:
+  #   * expressions (named or otherwise)
+  #   * variables that have new names
+  # All others (i.e. simple references to variables) should not be (re)-added
+  new_group_ind <- purrr::map_lgl(new_groups, ~!(quo_name(.x) %in% 
names(.data)))
+  named_group_ind <- purrr::map_lgl(names(new_groups), nzchar)
+  new_groups <- new_groups[new_group_ind | named_group_ind]
+
+  # now either use the name that was given in ... or if that is "" then use 
the expr
+  names(new_groups) <- purrr::map_chr(seq_along(new_groups), function(i) {
+    name <- names(new_groups)[[i]]
+    if (name == "") {
+      quo_name(new_groups[[i]])
+    } else {
+      name
+    }
+  })

Review comment:
       I'd also put this inside the `if (length(new_groups))` below, just to be 
clear that this only matters if we have new groups




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on a change in pull request #10785: ARROW-13434: [R] group_by() with an unnammed expression

Reply via email to