[
https://issues.apache.org/jira/browse/ARROW-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537051#comment-17537051
]
Jonathan Keane commented on ARROW-16577:
----------------------------------------
Thanks for the report! We don't currently support calling functions with the
package namespace attached — though it is something we are thinking about +
something we plan to support (see ARROW-14575 for some discussion and possible
approaches). We don't have a timeline for this, but it helps knowing that
someone is looking for it!
If you don't mind, I'm going to close this issue, but please to feel free to
continue the discussion on ARROW-14575
Thanks again!
> [R] dplyr `n` function cannot be called with `dplyr::n()`
> ---------------------------------------------------------
>
> Key: ARROW-16577
> URL: https://issues.apache.org/jira/browse/ARROW-16577
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Sam Bashevkin
> Priority: Major
>
> I am trying to summarize an arrow dataset in R using the `n` function from
> dplyr, but I noticed that it does not work when called via the `dplyr::n`
> syntax, even though it works fine just as `n`. I also tried the `n_distinct`
> function with the same issue
> ``` r
> library(arrow)
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> dir<-file.path(tempdir(), "test-data")
> test_data <- data.frame(A=1:10)
> write_dataset(test_data, dir)
> # This does work
> data2<-open_dataset(dir)%>%
> summarise(N=n())
> data2
> #> FileSystemDataset (query)
> #> N: int32
> #>
> #> See $.data for the source Arrow object
> collect(data2)
> #> # A tibble: 1 × 1
> #> N
> #> <int>
> #> 1 10
> # But this does not work
> data1<-open_dataset(dir)%>%
> summarise(N=dplyr::n())
> #> Error: Error : Expression dplyr::n() not supported in Arrow
> #> Call collect() first to pull data into R.
> data1
> #> Error in eval(expr, envir, enclos): object 'data1' not found
> ```
> <sup>Created on 2022-05-13 by the [reprex
> package](https://reprex.tidyverse.org) (v2.0.1)</sup>
> <details style="margin-bottom:10px;">
> <summary>
> Session info
> </summary>
> ``` r
> sessioninfo::session_info()
> #> ─ Session info
> ───────────────────────────────────────────────────────────────
> #> setting value
> #> version R version 4.2.0 (2022-04-22 ucrt)
> #> os Windows 10 x64 (build 19044)
> #> system x86_64, mingw32
> #> ui RTerm
> #> language (EN)
> #> collate English_United States.utf8
> #> ctype English_United States.utf8
> #> tz America/Los_Angeles
> #> date 2022-05-13
> #> pandoc 2.17.1.1 @ C:/Program Files/RStudio/bin/quarto/bin/ (via
> rmarkdown)
> #>
> #> ─ Packages
> ───────────────────────────────────────────────────────────────────
> #> package * version date (UTC) lib source
> #> arrow * 8.0.0 2022-05-09 [1] CRAN (R 4.2.0)
> #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
> #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0)
> #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0)
> #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
> #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
> #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0)
> #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
> #> dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)
> #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
> #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)
> #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
> #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
> #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
> #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0)
> #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
> #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)
> #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
> #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
> #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
> #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
> #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
> #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
> #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
> #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
> #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0)
> #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.2.0)
> #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0)
> #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
> #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
> #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0)
> #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
> #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
> #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)
> #> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
> #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
> #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
> #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
> #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)
> #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
> #>
> #> [1] C:/Users/sbashevkin/AppData/Local/R/win-library/4.2
> #> [2] C:/Program Files/R/R-4.2.0/library
> #>
> #>
> ──────────────────────────────────────────────────────────────────────────────
> ```
> </details>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)