jonkeane commented on a change in pull request #11612:
URL: https://github.com/apache/arrow/pull/11612#discussion_r743946777
##########
File path: r/tests/testthat/test-dplyr-summarize.R
##########
@@ -879,3 +879,20 @@ test_that("summarize() handles group_by .drop", {
)
)
})
+
+test_that("summarise() passes through type information for temporary columns",
{
+ # applies to ifelse and case_when(), in which argument types are checked
+ # within a translated function (previously this failed because the
appropriate
+ # schema was not available for n() > 1, mean(y), and mean(z))
+ compare_dplyr_binding(
+ .input %>%
+ group_by(x) %>%
+ summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
+ collect(),
+ tibble(
+ x = c(0, 1, 1),
+ y = c(2, 3, 5),
+ z = c(8, 13, 21)
+ )
+ )
Review comment:
Would it be good to test something like the following?
```
.input %>%
mutate(new_col = ...) %>%
group_by(x) %>%
summarise(r = ifelse(New_col > 1, mean(y), mean(z))) %>%
collect()
```
Tt should just work, but would be good to check.
##########
File path: r/R/dplyr-summarize.R
##########
@@ -218,8 +218,16 @@ summarize_eval <- function(name, quosure, ctx, hash,
recurse = FALSE) {
} else if (all(inner_agg_exprs)) {
# Something like: fun(agg(x), agg(y))
# So based on the aggregations that have been extracted, mutate after
+ agg_field_refs <- make_field_refs(names(ctx$aggregations))
+ agg_field_types <- lapply(ctx$aggregations, function(x) x$data$type())
Review comment:
`ctx$aggregations` are (a list of) expressions at this point, yeah?
##########
File path: r/tests/testthat/test-dplyr-summarize.R
##########
@@ -879,3 +879,20 @@ test_that("summarize() handles group_by .drop", {
)
)
})
+
+test_that("summarise() passes through type information for temporary columns",
{
+ # applies to ifelse and case_when(), in which argument types are checked
+ # within a translated function (previously this failed because the
appropriate
+ # schema was not available for n() > 1, mean(y), and mean(z))
+ compare_dplyr_binding(
+ .input %>%
+ group_by(x) %>%
+ summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
Review comment:
This is super-duper minor, but I wonder if we should use `if_else()`
here since that is a bit more strict about the types of the `true` and `false`
arguments. IIRC, our arrow `ifelse()` is just as strict (since it uses the
exact same implementation), but the intention of the test + testing the types
of those would be a little bit clear with it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]