jonkeane commented on a change in pull request #9909:
URL: https://github.com/apache/arrow/pull/9909#discussion_r607945460



##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) {
   expect_equal(as.data.frame(x), y, ...)
 }
 
+expect_r6_class <- function(object, class){
+  expect_s3_class(object, class)
+  expect_s3_class(object, c("R6"))

Review comment:
       This is super minor, but we don't need the `c(` and `)` here.

##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -272,7 +272,7 @@ test_that("filter() with string ops", {
 
 test_that("filter environment scope", {
   # "object 'b_var' not found"
-  expect_dplyr_error(input %>% filter(batch, chr == b_var))
+  expect_dplyr_error(filter(batch, chr == b_var), tbl)

Review comment:
       This one ended up being more complicated than I realized! It turns out 
that both `expect_dplyr_error()` had issues *and* these tests did. This isn't 
at all your fault, these tests weren't doing the right thing 
   
   We want to keep the `input %>%` part here, that [gets (tidily) 
evaled](https://github.com/thisisnic/arrow/blob/arrow-11752/r/tests/testthat/helper-expectation.R#L111)
 and replaces the `input` with whatever is being given in `tbl`.
   
   Additionally, that `batch` that is here when we started is just plain wrong. 
I looked through git history to figure out why I put it there, but cannot find 
a compelling reason or explanation.
   
    What we want on this line is something like:
    
    ```
      expect_dplyr_error(input %>% filter(chr == b_var), tbl)
    ```

##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline 
with `input` as its star
 expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its 
start
                                tbl,  # A tbl/df as reference, will make 
RB/Table with
                                ...) {
+  # ensure we have supplied tbl
+  tbl

Review comment:
       We might consider using `force()` here. It won't do anything different 
as far as I know, but it's what I've seen elsewhere used to mark that this 
object should be evaluated here.

##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline 
with `input` as its star
 expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its 
start
                                tbl,  # A tbl/df as reference, will make 
RB/Table with
                                ...) {
+  # ensure we have supplied tbl
+  tbl
+  
   expr <- rlang::enquo(expr)
   msg <- tryCatch(
     rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(input = tbl))),
     error = function (e) conditionMessage(e)

Review comment:
       ```suggestion
       error = function (e) {
             msg <- conditionMessage(e)
   
         # The error here is of the form:
         #
         # Problem with `filter()` input `..1`.
         # x object 'b_var' not found
         # ℹ Input `..1` is `chr == b_var`.
         #
         # but what we really care about is the `x` block
         # so (temporarily) let's pull those blocks out when we find them
         if (grepl("object '.*'.not.found", msg)) {
           return(gsub("^.*(object '.*'.not found).*$", "\\1", msg))
         }
         if (grepl('could not find function ".*"', msg)) {
           return(gsub('^.*(could not find function ".*").*$', "\\1", msg))
         }
         invisible(structure(msg, class = "try-error", condition = e))
       }
   ```

##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline 
with `input` as its star
 expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its 
start

Review comment:
       I did a bunch of digging through this + through git history to figure 
out what is going on here, and `expect_dplyr_error()` has a few issues on top 
of the one that you've caught here. This setup is super confusing since both 
the helper and the tests were wrong in different ways, making debugging this 
super hard (congrats on getting as much as you did here on your first PR, I 
wouldn't have gotten close to this!)
   
   Ultimately what we want is for `expect_dplyr_error()` to evaluate the 
expression both as a standard R / dplyr based expression and then also as an 
arrow-backed dplyr expression and then we compare that the errors we get from 
the pure-dplyr version are the same as the errors we get from the arrow version.
    
    _it turns out_ that the way that dplyr errors / the way that we catch them 
here isn't quite right and so the tests below will fail.
    
    I resurrected the code I suggest below from git history + some gsubing on 
the errors. The crux of the issue so far is:
   
   dplyr returns on error with the form:
   ```
   Problem with `filter()` input `..1`.
   x object 'b_var' not found
   ℹ Input `..1` is `chr == b_var`.
   ```
   
   but what we really care about is just the `x` block:
   `object 'b_var' not found`
   
   Which is all that arrow returns, but that should be good enough for us to 
consider them having the same error behavior.
   
   The code I added below is probably not the right way to do this (since we 
would only be able to test a handful of error types this way) but I wanted to 
confirm that was what was going on. 

##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", {
     tbl
   )
 
+  skip("test now faulty as code no longer gives error")
   # but there is an error if we don't override the masking with `.env`
   expect_dplyr_error(
     tbl %>%

Review comment:
       This `tbl %>%` we'll want to change to `input %>%` as well.

##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -283,7 +283,8 @@ test_that("filter environment scope", {
   )
   # Also for functions
   # 'could not find function "isEqualTo"' because we haven't defined it yet
-  expect_dplyr_error(filter(batch, isEqualTo(int, 4)))
+  expect_dplyr_error(filter(batch, isEqualTo(int, 4)), tbl)

Review comment:
       ```suggestion
     expect_dplyr_error(input %>% filter(isEqualTo(int, 4)), tbl)
   ```
   
   Like above, this test was wrong to start with, something like this is what 
we want here.

##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", {
     tbl
   )
 
+  skip("test now faulty as code no longer gives error")

Review comment:
       When you come back to this, could you put here what the output is that 
you get when you unskip this? I don't think that the changes we made above to 
`expect_dplyr_error` will magically fix this, but I think there is something 
else going on here.

##########
File path: r/tests/testthat/test-read-write.R
##########
@@ -119,7 +119,7 @@ test_that("reading/writing a raw vector (sparklyr 
integration)", {
     as.data.frame(RecordBatchStreamReader$create(x)$read_next_batch())
   }
   bytes <- write_to_raw(example_data)
-  expect_is(bytes, "raw")
+  expect_true(is.raw(bytes))

Review comment:
       Could we use `expect_type()` here?

##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) {
   expect_equal(as.data.frame(x), y, ...)
 }
 
+expect_r6_class <- function(object, class){

Review comment:
       Nice helper!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to