[
https://issues.apache.org/jira/browse/ARROW-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Keane closed ARROW-16577.
----------------------------------
Resolution: Duplicate
> [R] dplyr `n` function cannot be called with `dplyr::n()`
> ---------------------------------------------------------
>
> Key: ARROW-16577
> URL: https://issues.apache.org/jira/browse/ARROW-16577
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Sam Bashevkin
> Priority: Major
>
> I am trying to summarize an arrow dataset in R using the `n` function from
> dplyr, but I noticed that it does not work when called via the `dplyr::n`
> syntax, even though it works fine just as `n`. I also tried the `n_distinct`
> function with the same issue
> ``` r
> library(arrow)
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> dir<-file.path(tempdir(), "test-data")
> test_data <- data.frame(A=1:10)
> write_dataset(test_data, dir)
> # This does work
> data2<-open_dataset(dir)%>%
> summarise(N=n())
> data2
> #> FileSystemDataset (query)
> #> N: int32
> #>
> #> See $.data for the source Arrow object
> collect(data2)
> #> # A tibble: 1 × 1
> #> N
> #> <int>
> #> 1 10
> # But this does not work
> data1<-open_dataset(dir)%>%
> summarise(N=dplyr::n())
> #> Error: Error : Expression dplyr::n() not supported in Arrow
> #> Call collect() first to pull data into R.
> data1
> #> Error in eval(expr, envir, enclos): object 'data1' not found
> ```
> <sup>Created on 2022-05-13 by the [reprex
> package](https://reprex.tidyverse.org) (v2.0.1)</sup>
> <details style="margin-bottom:10px;">
> <summary>
> Session info
> </summary>
> ``` r
> sessioninfo::session_info()
> #> ─ Session info
> ───────────────────────────────────────────────────────────────
> #> setting value
> #> version R version 4.2.0 (2022-04-22 ucrt)
> #> os Windows 10 x64 (build 19044)
> #> system x86_64, mingw32
> #> ui RTerm
> #> language (EN)
> #> collate English_United States.utf8
> #> ctype English_United States.utf8
> #> tz America/Los_Angeles
> #> date 2022-05-13
> #> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via
> rmarkdown)
> #>
> #> ─ Packages
> ───────────────────────────────────────────────────────────────────
> #> package * version date (UTC) lib source
> #> arrow * 8.0.0 2022-05-09 [1] CRAN (R 4.2.0)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
> #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
> #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
> #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)
> #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
> #> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
> #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)
> #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
> #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
> #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
> #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)
> #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)
> #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
> #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
> #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
> #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
> #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
> #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
> #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)
> #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)
> #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0)
> #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
> #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
> #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
> #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
> #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)
> #> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
> #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
> #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
> #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
> #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)
> #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
> #>
> #> [1] C:/Users/sbashevkin/AppData/Local/R/win-library/4.2
> #> [2] C:/Program Files/R/R-4.2.0/library
> #>
> #>
> ──────────────────────────────────────────────────────────────────────────────
> ```
> </details>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)