jonkeane commented on code in PR #41223:
URL: https://github.com/apache/arrow/pull/41223#discussion_r1566578934
##########
r/R/dplyr-summarize.R:
##########
@@ -221,25 +257,27 @@ do_arrow_summarize <- function(.data, ..., .groups =
NULL) {
# It's more complex than other places because a single summarize() expr
# may result in multiple query nodes (Aggregate, Project),
# and we have to walk through the expressions to disentangle them.
- ctx <- env(
- mask = arrow_mask(.data, aggregation = TRUE),
- aggregations = empty_named_list(),
- post_mutate = empty_named_list()
- )
+
+ # Agg functions pull out the aggregation info and append it here
+ ..aggregations <- empty_named_list()
+ # And if there are any transformations after the aggregation, they go here
+ ..post_mutate <- empty_named_list()
+ mask <- arrow_mask(.data, aggregation = TRUE)
+
for (i in seq_along(exprs)) {
# Iterate over the indices and not the names because names may be repeated
# (which overwrites the previous name)
summarize_eval(
names(exprs)[i],
exprs[[i]],
- ctx,
+ mask,
length(.data$group_by_vars) > 0
)
}
# Apply the results to the .data object.
# First, the aggregations
- .data$aggregations <- ctx$aggregations
+ .data$aggregations <- ..aggregations
Review Comment:
Aaah yes, got it. I didn't totally put together that `..aggregations` wasn't
at the package scope, but that's really clever. And because it's transient
within the call we don't have to worry about flushing it at the end / cleaning
it up / managing state, yeah?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]