TPDeramus commented on issue #43207:
URL: https://github.com/apache/arrow/issues/43207#issuecomment-2223856365

   Hi Arrow Devs.
   
   Some individuals in the Posit forums found a solution and it prompted some 
discussion we thought might be worth sending your way:
   
https://forum.posit.co/t/arrow-with-tidyverse-calling-min-max-mean-with-summarize-on-arrow-tables/188985
   
   "_dplyr::across() also supports a purrr-style lambda definition, which 
strangely seems to work in arrow where the other methods failed._"
   
   ```
   data.frame(
     Participant = c('Greg', 'Greg', 'Donna', 'Donna'),
     Rating = c(21, NA, 17, NA)
   ) |>
     as_arrow_table() |>
     group_by(Participant) |>
     summarize(across(matches("Rating"), ~max(.x, na.rm = TRUE))) |>
     as.data.frame()
   ##   Participant Rating
   ## 1        Greg     21
   ## 2       Donna     17
   ```
   
   "_I'm not sure at what points the operations become outsourced to arrow 
methods, but I don't know whether the ~min(.x, ...) lambda notation somehow 
tricks dplyr into not outsourcing this operation to arrow.
   
   With dbplyr, everything is converted to SQL queries instead and you can view 
the SQL query to check it. Is there an equivalent arrow command that lets you 
see what commands are sent to arrow?_"
   
   Would any of you be willing to explain how this works on the backend?
   
   Happy to pass it on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to