ianmcook commented on code in PR #33917:
URL: https://github.com/apache/arrow/pull/33917#discussion_r1103185267
##########
r/R/dplyr-summarize.R:
##########
@@ -322,15 +301,71 @@ arrow_eval_or_stop <- function(expr, mask) {
out
}
+# This function returns a list of expressions which is used to project the data
+# before an aggregation. This list includes the fields used in the aggregation
+# expressions (the "targets") and the group fields. The names of the returned
+# list are used to ensure that the projection node is wired up correctly to the
+# aggregation node.
summarize_projection <- function(.data) {
c(
- map(.data$aggregations, ~ .$data),
+ unlist(unname(imap(
+ .data$aggregations,
+ ~set_names(
+ .x$data,
+ aggregate_target_names(.x$data, .y)
+ )
+ ))),
.data$selected_columns[.data$group_by_vars]
)
}
+# This function determines what names to give to the fields used in an
+# aggregation expression (the "targets"). When an aggregate function takes 2 or
+# more fields as targets, this function gives the fields unique names by
+# appending `..1`, `..2`, etc. When an aggregate function is nullary, this
+# function returns a zero-length character vector.
+aggregate_target_names <- function(data, name) {
+ if (length(data) > 1) {
+ paste(name, seq_along(data), sep = "..")
+ } else if (length(data) > 0) {
+ name
+ } else {
+ character(0)
+ }
+}
+
+# This function returns a named list of the data types of the aggregate columns
+# returned by an aggregation
+aggregate_types <- function(.data, hash, schema = NULL) {
+ map(
+ .data$aggregations,
+ ~if (hash) {
+ Expression$create(
+ paste0("hash_", .$fun),
+ # hash aggregate kernels must be passed another argument representing
+ # the groups, so we pass in a dummy scalar, since the groups will not
+ # affect the type that an aggregation returns
+ args = c(.$data, Scalar$create(1L, uint32())),
Review Comment:
Defining it outside of a function doesn't work because of build order. So I
did the next best thing and pulled the definition outside of the `map()` so it
at least it only incurs the overhead once per function call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]