[ 
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461541#comment-17461541
 ] 

Dewey Dunnington commented on ARROW-14209:
------------------------------------------

Where {{n_distinct()}} binding is: 
https://github.com/apache/arrow/blob/6e20c6b9d7131af41f2e979529d06e507c731373/r/R/dplyr-functions.R#L1091-L1097

Reprex:

{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

record_batch(
  a = c(1, 1, 2, 2, 1, NA, NA),
  b = c("a", "b", "c", "c", "a", "b", NA)
) %>% 
  summarise(
    distinct_vals_with_na = n_distinct(a, b, na.rm = FALSE),
    distinct_vals = n_distinct(a, b, na.rm = TRUE)
  )
#> Warning: Error : In n_distinct(a, b, na.rm = FALSE), Multiple arguments to
#> n_distinct() not supported in Arrow; pulling data into R
#> # A tibble: 1 × 2
#>   distinct_vals_with_na distinct_vals
#>                   <int>         <int>
#> 1                     5             3
{code}


> [R] Allow multiple arguments to n_distinct()
> --------------------------------------------
>
>                 Key: ARROW-14209
>                 URL: https://issues.apache.org/jira/browse/ARROW-14209
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Ian Cook
>            Priority: Major
>             Fix For: 7.0.0
>
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function 
> in the dplyr verb {{summarise()}} but only with a single argument. Add 
> support for multiple arguments to {{n_distinct()}}. This should return the 
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here: 
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to