nealrichardson commented on a change in pull request #9674:
URL: https://github.com/apache/arrow/pull/9674#discussion_r592785778
##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -155,6 +155,15 @@ test_that("filter() with %in%", {
)
})
+test_that("filter() with between()", {
+ expect_dplyr_equal(
+ input %>%
+ filter(between(dbl, 1, 2)) %>%
+ collect(),
+ tbl
+ )
+})
+
Review comment:
Yeah, that's why I was suggesting that you consider removing the
validation code and seeing how the Arrow C++ library handles bad inputs. If the
arrow error messages are clear enough (looks like they may be, though they're
slightly non-obvious because of how `between` is translated into expressions),
it may be worthwhile to let the C++ library handle what it can support and tell
us if it can't, rather than try to validate twice.
I believe that a side effect of removing the validation is that it would
allow the vector-wise operation--which you're right, it's a fair question as to
whether it should be supported, but if we get it for free, it doesn't seem
unreasonable to add. But IMO the main reason to remove the validation would be
to avoid some of the complexity of type-checking arrow expressions that you
experienced. You definitely can't type-check `x` if that's supposed to be a
field in the dataset; it would be possible to check its type in the dataset's
schema if it exists there (though the usual R functions like `is.numeric` and
`as.numeric` etc. won't make sense), but because `x` could be the result of a
previous `mutate()` step, we wouldn't know until it's evaluated what type it
will be.
I'd suggest adding a bunch of tests with garbage outputs ("QA engineer walks
into a bar and orders a beer. Orders -1 beers. Orders 0.99 beers. Orders NULL
beer. etc.") and see how the C++ library handles them. If it's good enough, I'd
drop the R validation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]