thisisnic commented on a change in pull request #11184:
URL: https://github.com/apache/arrow/pull/11184#discussion_r712134084
##########
File path: r/R/dplyr-summarize.R
##########
@@ -97,6 +93,35 @@ do_arrow_summarize <- function(.data, ..., .groups = NULL) {
ctx$post_mutate
)[c(.data$group_by_vars, names(exprs))]
}
+
+ # Handle .groups argument
+ if (length(.data$group_by_vars)) {
+ if (is.null(.groups)) {
+ # dplyr docs say:
+ # When ‘.groups’ is not specified, it is chosen based on the
+ # number of rows of the results:
+ # • If all the results have 1 row, you get "drop_last".
+ # • If the number of rows varies, you get "keep".
+ #
+ # But we don't support anything that returns multiple rows now
+ .groups <- "drop_last"
+ } else {
+ assert_that(is.string(.groups))
Review comment:
Sure, that makes a lot of sense and good point about it being an
experimental feature - sounds like something to leave for now but revisit if it
comes up again elsewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]