[
https://issues.apache.org/jira/browse/ARROW-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Cook resolved ARROW-15679.
------------------------------
Fix Version/s: 8.0.0
Resolution: Fixed
Issue resolved by pull request 12435
[https://github.com/apache/arrow/pull/12435]
> [R] count should return an ungrouped dataframe
> ----------------------------------------------
>
> Key: ARROW-15679
> URL: https://issues.apache.org/jira/browse/ARROW-15679
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 7.0.0
> Reporter: Sam Albers
> Assignee: Sam Albers
> Priority: Major
> Labels: pull-request-available
> Fix For: 8.0.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Unless grouped before `dplyr::count` returns a ungrouped data.frame. The
> arrow implement preserves the grouping variables:
>
> {code:java}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> tf1 <- tempfile()
> dir.create(tf1)
> starwars |>
> write_dataset(tf1)
> # no group ----------------------------------------------------------------
> ## dplyr behaviour
> count_dplyr_no_group <- starwars %>%
> count(gender, homeworld, species)
> group_vars(count_dplyr_no_group)
> #> character(0)
> ## arrow behaviour
> count_arrow_no_group <- open_dataset(tf1) %>%
> count(gender, homeworld, species) %>%
> collect()
> group_vars(count_arrow_no_group)
> #> [1] "gender" "homeworld"
> {code}
> If I am correct that this is a undesired behaviour I think it can be fixed
> [here|https://github.com/apache/arrow/blob/5ad5ddcafee8fada9cebb341df638b750c98efb7/r/R/dplyr-count.R#L20-L35]
> using this patch:
>
> {code:java}
> count.arrow_dplyr_query <- function(x, ..., wt = NULL, sort = FALSE, name =
> NULL) {
> if (!missing(...)) {
> out <- dplyr::group_by(x, ..., .add = TRUE)
> } else {
> out <- x
> }
> out <- dplyr::tally(out, wt = {{ wt }}, sort = sort, name = name)
> gv <- dplyr::group_vars(x)
> if (rlang::is_empty(gv)) {
> out <- dplyr::ungroup(out)
> } else {
> # Restore original group vars
> out$group_by_vars <- gv
> }
> out
> }
> {code}
>
> I can submit a PR with some tests if that would be helpful.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)