[GitHub] [arrow] nealrichardson commented on a diff in pull request #33917: GH-33892: [R] Map `dplyr::n()` to `count_all` kernel

via GitHub Fri, 10 Feb 2023 11:29:35 -0800


nealrichardson commented on code in PR #33917:
URL: https://github.com/apache/arrow/pull/33917#discussion_r1103153900



##########
r/R/dplyr-summarize.R:
##########
@@ -322,15 +301,71 @@ arrow_eval_or_stop <- function(expr, mask) {
   out
 }
 
+# This function returns a list of expressions which is used to project the data
+# before an aggregation. This list includes the fields used in the aggregation
+# expressions (the "targets") and the group fields. The names of the returned
+# list are used to ensure that the projection node is wired up correctly to the
+# aggregation node.
 summarize_projection <- function(.data) {
   c(
-    map(.data$aggregations, ~ .$data),
+    unlist(unname(imap(
+      .data$aggregations,
+      ~set_names(
+        .x$data,
+        aggregate_target_names(.x$data, .y)
+      )
+    ))),
     .data$selected_columns[.data$group_by_vars]
   )
 }
 
+# This function determines what names to give to the fields used in an
+# aggregation expression (the "targets"). When an aggregate function takes 2 or
+# more fields as targets, this function gives the fields unique names by
+# appending `..1`, `..2`, etc. When an aggregate function is nullary, this
+# function returns a zero-length character vector.
+aggregate_target_names <- function(data, name) {
+  if (length(data) > 1) {
+    paste(name, seq_along(data), sep = "..")
+  } else if (length(data) > 0) {
+    name
+  } else {
+    character(0)
+  }
+}
+
+# This function returns a named list of the data types of the aggregate columns
+# returned by an aggregation
+aggregate_types <- function(.data, hash, schema = NULL) {

Review Comment:
   That could predate the schema tracking in Expressions. I'd try removing it, 
and if the tests pass, I guess it's not needed? After all, that's what the 
tests are there to tell us.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on a diff in pull request #33917: GH-33892: [R] Map `dplyr::n()` to `count_all` kernel

Reply via email to