nealrichardson commented on a change in pull request #11184:
URL: https://github.com/apache/arrow/pull/11184#discussion_r712127803



##########
File path: r/R/dplyr-summarize.R
##########
@@ -97,6 +93,35 @@ do_arrow_summarize <- function(.data, ..., .groups = NULL) {
       ctx$post_mutate
     )[c(.data$group_by_vars, names(exprs))]
   }
+
+  # Handle .groups argument
+  if (length(.data$group_by_vars)) {
+    if (is.null(.groups)) {
+      # dplyr docs say:
+      # When ‘.groups’ is not specified, it is chosen based on the
+      # number of rows of the results:
+      # • If all the results have 1 row, you get "drop_last".
+      # • If the number of rows varies, you get "keep".
+      #
+      # But we don't support anything that returns multiple rows now
+      .groups <- "drop_last"
+    } else {
+      assert_that(is.string(.groups))

Review comment:
       I'm open to suggestion, and I'll think some more about whether this is a 
problem we have anywhere else, but my thinking for leaving it as it is: (1) 
when querying on a dataset, it will just error and tell you to pull into R 
(where rowwise would work); (2) it's an experimental feature in dplyr so IDK 
how much we should worry about the smoothest experience when you go off the 
default. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to