boshek commented on a change in pull request #9674:
URL: https://github.com/apache/arrow/pull/9674#discussion_r592569210



##########
File path: r/tests/testthat/test-dplyr-filter.R
##########
@@ -155,6 +155,15 @@ test_that("filter() with %in%", {
   )
 })
 
+test_that("filter() with between()", {
+  expect_dplyr_equal(
+    input %>%
+      filter(between(dbl, 1, 2)) %>%
+      collect(),
+    tbl
+  )
+})
+

Review comment:
       Great points. In terms of vector-wise comparisons I think that is a 
great idea. However given the original dplyr method doesn't support that, does 
it makes sense to add that to arrow? Honest question.
   
   I'm struggling a bit with the input validation for two reasons (aside from 
my limited understanding). First I am not totally clear how to check the type 
of a field in a `FileSystemDataset` object. If I add something like this into 
`between`:
   
   ```r
   if (!is.double(x)) {
           x <- as.numeric(x)
       }
   ```
   I get this error presumably because it is prematurely trying to bring the 
data into R
   > Error: Filter expression not supported for Arrow Datasets: between(x, 0.5, 
1.4)
   Call collect() first to pull data into R.
   
   Is there a way to check for types that I am missing?
   
   Second, `dplyr::between` allows this:
   ```r
   iris %>% 
      filter(between(Sepal.Length, 3, "5"))
   ```
   That seems.... weird? IMO `arrow` actually handles this better with a more 
informative error message:
   ```r
   open_dataset("foo") %>% 
      filter(between(x, 0.5, "5")) %>% 
      collect()
   Error: NotImplemented: Function less_equal has no kernel matching input 
types (array[double], scalar[string])
   ```
   Probably super simple questions but I am struggling with how to enforce the 
input within the function with `FileSystemDataset` objects, particular with 
these direct R mapping type ones. Once I have that figured out then I think I 
can sort out the validation test. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to