thisisnic commented on a change in pull request #11184:
URL: https://github.com/apache/arrow/pull/11184#discussion_r712104449



##########
File path: r/R/dplyr-summarize.R
##########
@@ -97,6 +93,35 @@ do_arrow_summarize <- function(.data, ..., .groups = NULL) {
       ctx$post_mutate
     )[c(.data$group_by_vars, names(exprs))]
   }
+
+  # Handle .groups argument
+  if (length(.data$group_by_vars)) {
+    if (is.null(.groups)) {
+      # dplyr docs say:
+      # When ‘.groups’ is not specified, it is chosen based on the
+      # number of rows of the results:
+      # • If all the results have 1 row, you get "drop_last".
+      # • If the number of rows varies, you get "keep".
+      #
+      # But we don't support anything that returns multiple rows now
+      .groups <- "drop_last"
+    } else {
+      assert_that(is.string(.groups))

Review comment:
       If the length of `.groups` is > 1, it's caught here, which raises an 
error and drops to the R level, which also raises an error.
   
   However, the error message from `dplyr` isn't quite fit for our purposes - 
it mentions that `rowwise` is a valid value for `.groups`, which is true in 
`dplyr` but not in `arrow`.  Is it worth catching this case ourselves and 
providing our own error message, which doesn't mention `rowwise`?
   
   [Edited after reading the tests below] Or is this something worth living 
with, as the alternative would require significant work to implement for little 
actual benefit?
   
   
   ```
   mtcars %>% 
      Table$create() %>%
      group_by(mpg) %>% 
      summarise(.groups = c("drop", "keep")) %>%
      collect()
   # Warning: Error : .groups is not a string (a length one character vector).; 
pulling data into R
   # Error: `.groups` can't be <chr>
   # ℹ Possible values are NULL (default), "drop_last", "drop", "keep", and 
"rowwise"
   # Run `rlang::last_error()` to see where the error occurred.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to