paleolimbot opened a new pull request #11612:
URL: https://github.com/apache/arrow/pull/11612


   This PR more correctly passes on type information for the temporary columns 
that are created when nested expressions exist. This isn't needed often, but 
occasionally the expression type is checked for the purposes of erroring or 
warning. The two examples where this happens are `ifelse()` and `case_when()`, 
which resulted in the following (valid) code pulling data into R with a 
confusing message:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   library(dplyr, warn.conflicts = FALSE)
   
   # motvating example
   RecordBatch$create(x = c(0, 1, 1), y = c(2, 3, 5), z = c(8, 13, 21)) %>%
     group_by(x) %>%
     summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>% 
     collect()
   #> Warning: Error : Expression ifelse(..temp0 > 1, ..temp1, ..temp2) not 
supported
   #> in Arrow; pulling data into R
   #> # A tibble: 2 × 2
   #>       x     r
   #>   <dbl> <dbl>
   #> 1     0     8
   #> 2     1     4
   ```
   
   After this PR, the above works without warning:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   library(dplyr, warn.conflicts = FALSE)
   
   # motvating example
   RecordBatch$create(x = c(0, 1, 1), y = c(2, 3, 5), z = c(8, 13, 21)) %>%
     group_by(x) %>%
     summarise(r = ifelse(n() > 1, mean(y), mean(z))) %>% 
     collect()
   #> # A tibble: 2 × 2
   #>       x     r
   #>   <dbl> <dbl>
   #> 1     0     8
   #> 2     1     4
   ```
   
   I imagine I'm missing some of the complexity here...feel free to let me know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to