paleolimbot commented on a change in pull request #11612:
URL: https://github.com/apache/arrow/pull/11612#discussion_r744631444



##########
File path: r/tests/testthat/test-dplyr-summarize.R
##########
@@ -879,3 +879,20 @@ test_that("summarize() handles group_by .drop", {
     )
   )
 })
+
+test_that("summarise() passes through type information for temporary columns", 
{
+  # applies to ifelse and case_when(), in which argument types are checked
+  # within a translated function (previously this failed because the 
appropriate
+  # schema was not available for n() > 1, mean(y), and mean(z))
+  compare_dplyr_binding(
+    .input %>%
+      group_by(x) %>%
+      summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
+      collect(),
+    tibble(
+      x = c(0, 1, 1),
+      y = c(2, 3, 5),
+      z = c(8, 13, 21)
+    )
+  )

Review comment:
       It doesn't, and I wonder if all the elements of inner expressions have 
to be aggregate expressions? FWIW I get:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   library(dplyr, warn.conflicts = FALSE)
   
   RecordBatch$create(x = c(0, 1, 1), y = c(2, 3, 5), z = c(8, 13, 21)) %>%
     mutate(new_col = x + 0.1) %>%
     group_by(x) %>%
     summarise(r = ifelse(new_col > 1, mean(y), mean(z))) %>%
     collect()
   #> Warning: Error : Expression ifelse(new_col > 1, mean(y), mean(z)) not 
supported
   #> in Arrow; pulling data into R
   #> `summarise()` has grouped output by 'x'. You can override using the 
`.groups` argument.
   #> # A tibble: 3 × 2
   #> # Groups:   x [2]
   #>       x     r
   #>   <dbl> <dbl>
   #> 1     0     8
   #> 2     1     4
   #> 3     1     4
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to