jonkeane commented on a change in pull request #11612:
URL: https://github.com/apache/arrow/pull/11612#discussion_r745146259
##########
File path: r/tests/testthat/test-dplyr-summarize.R
##########
@@ -879,3 +879,20 @@ test_that("summarize() handles group_by .drop", {
)
)
})
+
+test_that("summarise() passes through type information for temporary columns",
{
+ # applies to ifelse and case_when(), in which argument types are checked
+ # within a translated function (previously this failed because the
appropriate
+ # schema was not available for n() > 1, mean(y), and mean(z))
+ compare_dplyr_binding(
+ .input %>%
+ group_by(x) %>%
+ summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
+ collect(),
+ tibble(
+ x = c(0, 1, 1),
+ y = c(2, 3, 5),
+ z = c(8, 13, 21)
+ )
+ )
Review comment:
Hmm ok I see more of what's going on here. My first question was a bit
off (since at the `ifelse()` stage we really should be evaluating aggregates in
it's first argument — like in the example you gave above + you used in the
tests where we're looking at `n()` for the group).
But I'm not 100% sure if the aggregate aggregate expressions are the problem
here.
This new column column is admittedly a little strange (though I can imagine
situations where this is ~something someone wants to do!) but I'm surprised
that this still errors:
``` r
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
tab <- Table$create(starwars)
tab %>%
group_by(gender) %>%
summarise(
height_mean = mean(height, na.rm = TRUE),
height_median = median(height, na.rm = TRUE),
height_evil_avg = if_else(
height_mean > 178,
mean(height, na.rm = TRUE),
as.double(median(height, na.rm = TRUE))
)) %>%
collect()
#> Warning: median() currently returns an approximate median in Arrow
#> Warning: median() currently returns an approximate median in Arrow
#> Warning: Error : Expression if_else(height_mean > 178, ..temp2,
#> as.double(..temp3)) not supported in Arrow; pulling data into R
#> # A tibble: 3 × 4
#> gender height_mean height_median height_evil_avg
#> <chr> <dbl> <dbl> <dbl>
#> 1 feminine 165. 166. 166.
#> 2 masculine 177. 183 183
#> 3 <NA> 181. 183 181.
```
It's not immediately clear to me that it's the same thing as ARROW-14586,
but maybe they are interrelated? Anyway, if there's not an obvious way to get
that working here that's totally fine, but maybe we should add this example to
ARROW-14586 or we can create a new jira for trying to make this type work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]