Sam Albers created ARROW-12693:
----------------------------------

             Summary: Usage of computer function - Use case of unique function
                 Key: ARROW-12693
                 URL: https://issues.apache.org/jira/browse/ARROW-12693
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Sam Albers


I am trying to see if I can leverage `unique` on a Dataset object. Imagining a 
much big dataset, I am trying to get away from the this expensive pattern:
{code:java}
Dataset %>%
  pull(col) %>%
  unique(){code}
However when I try the option below it is not working quite how I'd expect. I'm 
actually not able to get any working (e.g. `arrow_mean`) so maybe I am 
misunderstanding how these are meant to work.

 
{code:java}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
dir.create("iris")
iris %>%
 group_by(Species) %>%
 write_dataset("iris")
ds <- open_dataset("iris")
ds %>%
 mutate(unique = arrow_unique(Species)) %>%
 collect()
#> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar expression 
unique("setosa")
ds %>%
 mutate(unique = arrow_unique(Petal.Width)) %>%
 collect()
#> Error: Invalid: ExecuteScalarExpression cannot Execute non-scalar expression 
{Sepal.Length=Sepal.Length, Sepal.Width=Sepal.Width, Petal.Length=Petal.Length, 
Petal.Width=Petal.Width, Species="setosa", unique=unique(Petal.Width)}

call_function("unique", ds, "Species")
#> Error: Argument 1 is of class FileSystemDataset but it must be one of 
"Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"
call_function("unique", ds, "Petal.Width")
#> Error: Argument 1 is of class FileSystemDataset but it must be one of 
"Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"

call_function("mean", ds, "Petal.Width")
#> Error: Argument 1 is of class FileSystemDataset but it must be one of 
"Array", "ChunkedArray", "RecordBatch", "Table", or "Scalar"

sessioninfo::session_info()
#> - Session info 
---------------------------------------------------------------
#> setting value 
#> version R version 4.0.5 (2021-03-31)
#> os Windows 10 x64 
#> system x86_64, mingw32 
#> ui RTerm 
#> language (EN) 
#> collate English_Canada.1252 
#> ctype English_Canada.1252 
#> tz America/Los_Angeles 
#> date 2021-05-07 
#> 
#> - Packages 
-------------------------------------------------------------------
#> package * version date lib source 
#> arrow * 4.0.0 2021-04-27 [1] CRAN (R 4.0.5)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
#> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.2)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> dplyr * 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.0.4)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
#> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.0.5)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.0.2)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.0.2)
#> R.utils 2.10.1 2020-08-26 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
#> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.4)
#> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.1.0)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.4)
#> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] C:/Users/salbers/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.5/library

{code}
{color:#172b4d}I am opening this a) because others may have run into the same 
issue and b) just in case this is actually a bug. Feel free to close 
immediately if this isn't the way these are supposed to work. {color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to