jonkeane commented on a change in pull request #9909: URL: https://github.com/apache/arrow/pull/9909#discussion_r607945460
########## File path: r/tests/testthat/helper-expectation.R ########## @@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) { expect_equal(as.data.frame(x), y, ...) } +expect_r6_class <- function(object, class){ + expect_s3_class(object, class) + expect_s3_class(object, c("R6")) Review comment: This is super minor, but we don't need the `c(` and `)` here. ########## File path: r/tests/testthat/test-dplyr-filter.R ########## @@ -272,7 +272,7 @@ test_that("filter() with string ops", { test_that("filter environment scope", { # "object 'b_var' not found" - expect_dplyr_error(input %>% filter(batch, chr == b_var)) + expect_dplyr_error(filter(batch, chr == b_var), tbl) Review comment: This one ended up being more complicated than I realized! It turns out that both `expect_dplyr_error()` had issues *and* these tests did. This isn't at all your fault, these tests weren't doing the right thing We want to keep the `input %>%` part here, that [gets (tidily) evaled](https://github.com/thisisnic/arrow/blob/arrow-11752/r/tests/testthat/helper-expectation.R#L111) and replaces the `input` with whatever is being given in `tbl`. Additionally, that `batch` that is here when we started is just plain wrong. I looked through git history to figure out why I put it there, but cannot find a compelling reason or explanation. What we want on this line is something like: ``` expect_dplyr_error(input %>% filter(chr == b_var), tbl) ``` ########## File path: r/tests/testthat/helper-expectation.R ########## @@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline with `input` as its star expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its start tbl, # A tbl/df as reference, will make RB/Table with ...) { + # ensure we have supplied tbl + tbl Review comment: We might consider using `force()` here. It won't do anything different as far as I know, but it's what I've seen elsewhere used to mark that this object should be evaluated here. ########## File path: r/tests/testthat/helper-expectation.R ########## @@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline with `input` as its star expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its start tbl, # A tbl/df as reference, will make RB/Table with ...) { + # ensure we have supplied tbl + tbl + expr <- rlang::enquo(expr) msg <- tryCatch( rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(input = tbl))), error = function (e) conditionMessage(e) Review comment: ```suggestion error = function (e) { msg <- conditionMessage(e) # The error here is of the form: # # Problem with `filter()` input `..1`. # x object 'b_var' not found # ℹ Input `..1` is `chr == b_var`. # # but what we really care about is the `x` block # so (temporarily) let's pull those blocks out when we find them if (grepl("object '.*'.not.found", msg)) { return(gsub("^.*(object '.*'.not found).*$", "\\1", msg)) } if (grepl('could not find function ".*"', msg)) { return(gsub('^.*(could not find function ".*").*$', "\\1", msg)) } invisible(structure(msg, class = "try-error", condition = e)) } ``` ########## File path: r/tests/testthat/helper-expectation.R ########## @@ -98,12 +103,16 @@ expect_dplyr_equal <- function(expr, # A dplyr pipeline with `input` as its star expect_dplyr_error <- function(expr, # A dplyr pipeline with `input` as its start Review comment: I did a bunch of digging through this + through git history to figure out what is going on here, and `expect_dplyr_error()` has a few issues on top of the one that you've caught here. This setup is super confusing since both the helper and the tests were wrong in different ways, making debugging this super hard (congrats on getting as much as you did here on your first PR, I wouldn't have gotten close to this!) Ultimately what we want is for `expect_dplyr_error()` to evaluate the expression both as a standard R / dplyr based expression and then also as an arrow-backed dplyr expression and then we compare that the errors we get from the pure-dplyr version are the same as the errors we get from the arrow version. _it turns out_ that the way that dplyr errors / the way that we catch them here isn't quite right and so the tests below will fail. I resurrected the code I suggest below from git history + some gsubing on the errors. The crux of the issue so far is: dplyr returns on error with the form: ``` Problem with `filter()` input `..1`. x object 'b_var' not found ℹ Input `..1` is `chr == b_var`. ``` but what we really care about is just the `x` block: `object 'b_var' not found` Which is all that arrow returns, but that should be good enough for us to consider them having the same error behavior. The code I added below is probably not the right way to do this (since we would only be able to test a handful of error types this way) but I wanted to confirm that was what was going on. ########## File path: r/tests/testthat/test-dplyr-filter.R ########## @@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", { tbl ) + skip("test now faulty as code no longer gives error") # but there is an error if we don't override the masking with `.env` expect_dplyr_error( tbl %>% Review comment: This `tbl %>%` we'll want to change to `input %>%` as well. ########## File path: r/tests/testthat/test-dplyr-filter.R ########## @@ -283,7 +283,8 @@ test_that("filter environment scope", { ) # Also for functions # 'could not find function "isEqualTo"' because we haven't defined it yet - expect_dplyr_error(filter(batch, isEqualTo(int, 4))) + expect_dplyr_error(filter(batch, isEqualTo(int, 4)), tbl) Review comment: ```suggestion expect_dplyr_error(input %>% filter(isEqualTo(int, 4)), tbl) ``` Like above, this test was wrong to start with, something like this is what we want here. ########## File path: r/tests/testthat/test-dplyr-filter.R ########## @@ -389,11 +390,14 @@ test_that("filter() with .data pronoun", { tbl ) + skip("test now faulty as code no longer gives error") Review comment: When you come back to this, could you put here what the output is that you get when you unskip this? I don't think that the changes we made above to `expect_dplyr_error` will magically fix this, but I think there is something else going on here. ########## File path: r/tests/testthat/test-read-write.R ########## @@ -119,7 +119,7 @@ test_that("reading/writing a raw vector (sparklyr integration)", { as.data.frame(RecordBatchStreamReader$create(x)$read_next_batch()) } bytes <- write_to_raw(example_data) - expect_is(bytes, "raw") + expect_true(is.raw(bytes)) Review comment: Could we use `expect_type()` here? ########## File path: r/tests/testthat/helper-expectation.R ########## @@ -23,6 +23,11 @@ expect_data_frame <- function(x, y, ...) { expect_equal(as.data.frame(x), y, ...) } +expect_r6_class <- function(object, class){ Review comment: Nice helper! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org