[
https://issues.apache.org/jira/browse/ARROW-11925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299101#comment-17299101
]
Neal Richardson commented on ARROW-11925:
-----------------------------------------
Sure! You'd register the function around here:
https://github.com/apache/arrow/blob/master/r/R/dplyr.R#L292
> Add `between` method for arrow_dplyr_query
> ------------------------------------------
>
> Key: ARROW-11925
> URL: https://issues.apache.org/jira/browse/ARROW-11925
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Sam Albers
> Priority: Minor
>
> Would you consider a PR to add a between method for `arrow_dplyr_query`
> objects? Even something implemented directly in R harnesses the arrow speed.
> Here is what I am thinking:
> Typical usage of `between`:
>
> {code:java}
> library(dplyr)
> library(arrow)
> iris %>% filter(between(Petal.Length, 1, 1.1)){code}
>
> Here is a mocked up version of the method:
>
> {code:java}
> between_mock <- function(x, left, right) {
> if (length(left) != 1) {
> rlang::abort("`left` must be length 1")
> }
> if (length(right) != 1) {
> rlang::abort("`right` must be length 1")
> }x >= left & x <= right
> }{code}
> I think because `dplyr` uses C++ to efficiently do this, `between` doesn't
> work out of the box:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between(fare_amount, 10, 11)) %>%
> collect()
> Error: Filter expression not supported for Arrow Datasets:
> between(fare_amount, 10, 11)
> Call collect() first to pull data into R.
> In addition: Warning message:
> between() called on numeric vector with S3 class
> Backtrace:
> x
> 1. +-[ `%>%`(...) ]
> 2. +-[ dplyr::collect(...) ]
> 3. +-[ dplyr::filter(...) ]
> 4. \-arrow:::filter.arrow_dplyr_query(...){code}
> But even my simple implementation works fine:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between_mock(fare_amount, 10, 11)) %>%
> collect() {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)