TPDeramus commented on issue #43207: URL: https://github.com/apache/arrow/issues/43207#issuecomment-2223856365
Hi Arrow Devs. Some individuals in the Posit forums found a solution and it prompted some discussion we thought might be worth sending your way: https://forum.posit.co/t/arrow-with-tidyverse-calling-min-max-mean-with-summarize-on-arrow-tables/188985 "_dplyr::across() also supports a purrr-style lambda definition, which strangely seems to work in arrow where the other methods failed._" ``` data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> as_arrow_table() |> group_by(Participant) |> summarize(across(matches("Rating"), ~max(.x, na.rm = TRUE))) |> as.data.frame() ## Participant Rating ## 1 Greg 21 ## 2 Donna 17 ``` "_I'm not sure at what points the operations become outsourced to arrow methods, but I don't know whether the ~min(.x, ...) lambda notation somehow tricks dplyr into not outsourcing this operation to arrow. With dbplyr, everything is converted to SQL queries instead and you can view the SQL query to check it. Is there an equivalent arrow command that lets you see what commands are sent to arrow?_" Would any of you be willing to explain how this works on the backend? Happy to pass it on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
