TPDeramus opened a new issue, #43207: URL: https://github.com/apache/arrow/issues/43207
### Describe the usage question you have. Please include as many useful details as possible. Hi Arrow devs. I wanted to ask about something I noticed about using the column-wise operators with `dplyr` in `arrow` tables. If I had an arrow table, and I wanted to run a basic function such as `mean`, `max`, or `min` using `summarize`, it appears that `arrow` does not currently accept the `na.rm = TRUE` argument, or that if it does, I can't seem to find it in the documentation. Say I took the original dataset: Producing: | Participant | Rating | | ------------ | -------- | | Donna | 17 | | Donna | NA | | Greg | 21 | | Greg | NA | For example, if these were generic `R` dataframes, either of these two calls would work (though one is deprecated): ``` data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> group_by(Participant) |> summarize(across(matches("Rating"), \(x) max(x, na.rm = TRUE))) |> as.data.frame() data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> group_by(Participant) |> summarize(across(matches("Rating"), max, na.rm = TRUE)) |> as.data.frame() ``` Producing: | Participant | Rating | | ------------ | -------- | | Donna | 17 | | Greg | 21 | However, when I run the same commands as an arrow table, both throw errors: ``` data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> as_arrow_table() |> group_by(Participant) |> summarize(across(matches("Rating"), \(x) max(x, na.rm = TRUE))) |> as.data.frame() Error in `across_setup()`: ! Anonymous functions are not yet supported in Arrow Run `rlang::last_trace()` to see where the error occurred. data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> as_arrow_table() |> group_by(Participant) |> summarize(across(matches("Rating"), max, na.rm = TRUE)) |> as.data.frame() Error in `expand_across()`: ! `...` argument to `across()` is deprecated in dplyr and not supported in Arrow Run `rlang::last_trace()` to see where the error occurred. ``` And the one that does work: ``` data.frame( Participant = c('Greg', 'Greg', 'Donna', 'Donna'), Rating = c(21, NA, 17, NA) ) |> as_arrow_table() |> group_by(Participant) |> summarize(across(matches("Rating"), max)) |> as.data.frame() ``` Returns `NA` values that are not what I want: | Participant | Rating | | ------------ | -------- | | Donna | NA | | Greg | NA | Is there a way to pass the `na.rm = TRUE` argument to this call without having to manually drop the `NA` values for each column or row of interest I have in my data? ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org