jonkeane commented on a change in pull request #9909:
URL: https://github.com/apache/arrow/pull/9909#discussion_r607945460
##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) {
expect_equal(as.data.frame(x), y, ...)
}
+expect_r6_class <- function(object, class){
+ expect_s3_class(object, class)
+ expect_s3_class(object, c("R6"))
Review comment:
This is super minor, but we don't need the `c(` and `)` here.
##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -272,7 +272,7 @@ test_that("filter() with string ops", {
test_that("filter environment scope", {
# "object 'b_var' not found"
- expect_dplyr_error(input %>% filter(batch, chr == b_var))
+ expect_dplyr_error(filter(batch, chr == b_var), tbl)
Review comment:
This one ended up being more complicated than I realized! It turns out
that both `expect_dplyr_error()` had issues *and* these tests did. This isn't
at all your fault, these tests weren't doing the right thing
We want to keep the `input %>%` part here, that [gets (tidily)
evaled](https://github.com/thisisnic/arrow/blob/arrow-11752/r/tests/testthat/helper-expectation.R#L111)
and replaces the `input` with whatever is being given in `tbl`.
Additionally, that `batch` that is here when we started is just plain wrong.
I looked through git history to figure out why I put it there, but cannot find
a compelling reason or explanation.
What we want on this line is something like:
```
expect_dplyr_error(input %>% filter(chr == b_var), tbl)
```
##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline
with `input` as its star
expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its
start
tbl, # A tbl/df as reference, will make
RB/Table with
...) {
+ # ensure we have supplied tbl
+ tbl
Review comment:
We might consider using `force()` here. It won't do anything different
as far as I know, but it's what I've seen elsewhere used to mark that this
object should be evaluated here.
##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline
with `input` as its star
expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its
start
tbl, # A tbl/df as reference, will make
RB/Table with
...) {
+ # ensure we have supplied tbl
+ tbl
+
expr <- rlang::enquo(expr)
msg <- tryCatch(
rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(input = tbl))),
error = function (e) conditionMessage(e)
Review comment:
```suggestion
error = function (e) {
msg <- conditionMessage(e)
# The error here is of the form:
#
# Problem with `filter()` input `..1`.
# x object 'b_var' not found
# ℹ Input `..1` is `chr == b_var`.
#
# but what we really care about is the `x` block
# so (temporarily) let's pull those blocks out when we find them
if (grepl("object '.*'.not.found", msg)) {
return(gsub("^.*(object '.*'.not found).*$", "\\1", msg))
}
if (grepl('could not find function ".*"', msg)) {
return(gsub('^.*(could not find function ".*").*$', "\\1", msg))
}
invisible(structure(msg, class = "try-error", condition = e))
}
```
##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline
with `input` as its star
expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its
start
Review comment:
I did a bunch of digging through this + through git history to figure
out what is going on here, and `expect_dplyr_error()` has a few issues on top
of the one that you've caught here. This setup is super confusing since both
the helper and the tests were wrong in different ways, making debugging this
super hard (congrats on getting as much as you did here on your first PR, I
wouldn't have gotten close to this!)
Ultimately what we want is for `expect_dplyr_error()` to evaluate the
expression both as a standard R / dplyr based expression and then also as an
arrow-backed dplyr expression and then we compare that the errors we get from
the pure-dplyr version are the same as the errors we get from the arrow version.
_it turns out_ that the way that dplyr errors / the way that we catch them
here isn't quite right and so the tests below will fail.
I resurrected the code I suggest below from git history + some gsubing on
the errors. The crux of the issue so far is:
dplyr returns on error with the form:
```
Problem with `filter()` input `..1`.
x object 'b_var' not found
ℹ Input `..1` is `chr == b_var`.
```
but what we really care about is just the `x` block:
`object 'b_var' not found`
Which is all that arrow returns, but that should be good enough for us to
consider them having the same error behavior.
The code I added below is probably not the right way to do this (since we
would only be able to test a handful of error types this way) but I wanted to
confirm that was what was going on.
##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", {
tbl
)
+ skip("test now faulty as code no longer gives error")
# but there is an error if we don't override the masking with `.env`
expect_dplyr_error(
tbl %>%
Review comment:
This `tbl %>%` we'll want to change to `input %>%` as well.
##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -283,7 +283,8 @@ test_that("filter environment scope", {
)
# Also for functions
# 'could not find function "isEqualTo"' because we haven't defined it yet
- expect_dplyr_error(filter(batch, isEqualTo(int, 4)))
+ expect_dplyr_error(filter(batch, isEqualTo(int, 4)), tbl)
Review comment:
```suggestion
expect_dplyr_error(input %>% filter(isEqualTo(int, 4)), tbl)
```
Like above, this test was wrong to start with, something like this is what
we want here.
##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", {
tbl
)
+ skip("test now faulty as code no longer gives error")
Review comment:
When you come back to this, could you put here what the output is that
you get when you unskip this? I don't think that the changes we made above to
`expect_dplyr_error` will magically fix this, but I think there is something
else going on here.
##########
File path: r/tests/testthat/test-read-write.R
##########
@@ -119,7 +119,7 @@ test_that("reading/writing a raw vector (sparklyr
integration)", {
as.data.frame(RecordBatchStreamReader$create(x)$read_next_batch())
}
bytes <- write_to_raw(example_data)
- expect_is(bytes, "raw")
+ expect_true(is.raw(bytes))
Review comment:
Could we use `expect_type()` here?
##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) {
expect_equal(as.data.frame(x), y, ...)
}
+expect_r6_class <- function(object, class){
Review comment:
Nice helper!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]