paleolimbot opened a new pull request #11612:
URL: https://github.com/apache/arrow/pull/11612
This PR more correctly passes on type information for the temporary columns
that are created when nested expressions exist. This isn't needed often, but
occasionally the expression type is checked for the purposes of erroring or
warning. The two examples where this happens are `ifelse()` and `case_when()`,
which resulted in the following (valid) code pulling data into R with a
confusing message:
``` r
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
# motvating example
RecordBatch$create(x = c(0, 1, 1), y = c(2, 3, 5), z = c(8, 13, 21)) %>%
group_by(x) %>%
summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
collect()
#> Warning: Error : Expression ifelse(..temp0 > 1, ..temp1, ..temp2) not
supported
#> in Arrow; pulling data into R
#> # A tibble: 2 × 2
#> x r
#> <dbl> <dbl>
#> 1 0 8
#> 2 1 4
```
After this PR, the above works without warning:
``` r
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
# motvating example
RecordBatch$create(x = c(0, 1, 1), y = c(2, 3, 5), z = c(8, 13, 21)) %>%
group_by(x) %>%
summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>%
collect()
#> # A tibble: 2 × 2
#> x r
#> <dbl> <dbl>
#> 1 0 8
#> 2 1 4
```
I imagine I'm missing some of the complexity here...feel free to let me know!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]