ianmcook commented on code in PR #33917:
URL: https://github.com/apache/arrow/pull/33917#discussion_r1095184265
##########
r/R/dplyr-summarize.R:
##########
@@ -322,15 +301,76 @@ arrow_eval_or_stop <- function(expr, mask) {
out
}
+# This function returns a list of expressions that can be used to project the
+# data before an aggregation to only the fields required for the aggregation,
+# including the fields used in the aggregations (the "targets") and the group
+# fields. The names of the returned list are used to ensure that the projection
+# node is wired up correctly to the aggregation node.
summarize_projection <- function(.data) {
c(
- map(.data$aggregations, ~ .$data),
+ unlist(unname(imap(
+ .data$aggregations,
+ ~set_names(
+ .x$data,
+ aggregate_target_names(.x$data, .y)
+ )
+ ))),
+ .data$selected_columns[.data$group_by_vars]
+ )
+}
+
+# This function determines what names to give to the fields used in
aggregations
+# (the "targets"). When an aggregate function takes 2 or more fields as
targets,
+# this function gives the fields unique names by appending `..1`, `..2`, etc.
Review Comment:
I plan to do a PR for #33959 soon. I would prefer to leave this code in and
change it as needed in that PR. In the meantime, it is unreachable code, which
I know isn't great. But taking it out leaves things in an awkward state where
obvious cases are unhandled in the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]