paleolimbot commented on code in PR #13641:
URL: https://github.com/apache/arrow/pull/13641#discussion_r927933364
##########
r/tests/testthat/test-dataset.R:
##########
@@ -713,6 +713,26 @@ test_that("head/tail", {
expect_error(tail(ds, -1)) # Not yet implemented
})
+
+test_that("unique returns data.frames", {
+ ds <- open_dataset(dataset_dir)
+ in_r_mem <- rbind(df1, df2)
+
+ expect_s3_class(unique(ds), "data.frame")
+ ## order not set by distinct so some sorting required
+ expect_equal(sort(unique(ds)$int), sort(unique(in_r_mem)$int))
+
+ ## on a arrow_dplyr_query
Review Comment:
```suggestion
# on a arrow_dplyr_query
```
##########
r/tests/testthat/test-dataset.R:
##########
@@ -713,6 +713,26 @@ test_that("head/tail", {
expect_error(tail(ds, -1)) # Not yet implemented
})
+
+test_that("unique returns data.frames", {
+ ds <- open_dataset(dataset_dir)
+ in_r_mem <- rbind(df1, df2)
+
+ expect_s3_class(unique(ds), "data.frame")
+ ## order not set by distinct so some sorting required
Review Comment:
```suggestion
# order not set by distinct so some sorting required
```
##########
r/tests/testthat/test-dataset.R:
##########
@@ -713,6 +713,26 @@ test_that("head/tail", {
expect_error(tail(ds, -1)) # Not yet implemented
})
+
+test_that("unique returns data.frames", {
+ ds <- open_dataset(dataset_dir)
+ in_r_mem <- rbind(df1, df2)
+
+ expect_s3_class(unique(ds), "data.frame")
+ ## order not set by distinct so some sorting required
+ expect_equal(sort(unique(ds)$int), sort(unique(in_r_mem)$int))
+
+ ## on a arrow_dplyr_query
+ adq_eg <- ds %>%
+ select(fct) %>%
+ unique()
+ expect_s3_class(adq_eg, "data.frame")
+
+ expect_equal(unique(arrow_table(in_r_mem)), unique(in_r_mem))
+ expect_equal(unique(as_record_batch_reader(in_r_mem)), unique(in_r_mem))
+})
Review Comment:
I would also expect a test for the errors (e.g.,
`expect_snapshot_error(unique(..., incomparables = TRUE))`)
##########
r/R/dplyr.R:
##########
@@ -184,6 +184,29 @@ dim.arrow_dplyr_query <- function(x) {
c(rows, cols)
}
+#' @export
+unique.arrow_dplyr_query <- function(x, incomparables = FALSE, fromLast =
FALSE, ...) {
+
+ if (incomparables == TRUE) {
+ arrow_not_supported("`unique()` with `incomparables = TRUE`")
+ }
+
+ if (fromLast == TRUE) {
+ arrow_not_supported("`unique()` with `fromLast = TRUE`")
+ }
+
+ x <- dplyr::distinct(x)
+ dplyr::collect(x)
Review Comment:
Is there a reason why we have to `collect()` here? I would have assumed that
I would need an explicit collect but I don't know if there's a precedent (does
dbplyr do this?).
##########
r/R/dplyr.R:
##########
@@ -184,6 +184,29 @@ dim.arrow_dplyr_query <- function(x) {
c(rows, cols)
}
+#' @export
+unique.arrow_dplyr_query <- function(x, incomparables = FALSE, fromLast =
FALSE, ...) {
+
+ if (incomparables == TRUE) {
Review Comment:
```suggestion
if (isTRUE(incomparables)) {
```
(because the condition must evaluate to TRUE or FALSE or the user gets a
confusing error)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]