[
https://issues.apache.org/jira/browse/ARROW-11925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson reassigned ARROW-11925:
---------------------------------------
Assignee: Neal Richardson
> [R] Add `between` method for arrow_dplyr_query
> ----------------------------------------------
>
> Key: ARROW-11925
> URL: https://issues.apache.org/jira/browse/ARROW-11925
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Sam Albers
> Assignee: Neal Richardson
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Would you consider a PR to add a between method for `arrow_dplyr_query`
> objects? Even something implemented directly in R harnesses the arrow speed.
> Here is what I am thinking:
> Typical usage of `between`:
>
> {code:java}
> library(dplyr)
> library(arrow)
> iris %>% filter(between(Petal.Length, 1, 1.1)){code}
>
> Here is a mocked up version of the method:
>
> {code:java}
> between_mock <- function(x, left, right) {
> if (length(left) != 1) {
> rlang::abort("`left` must be length 1")
> }
> if (length(right) != 1) {
> rlang::abort("`right` must be length 1")
> }x >= left & x <= right
> }{code}
> I think because `dplyr` uses C++ to efficiently do this, `between` doesn't
> work out of the box:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between(fare_amount, 10, 11)) %>%
> collect()
> Error: Filter expression not supported for Arrow Datasets:
> between(fare_amount, 10, 11)
> Call collect() first to pull data into R.
> In addition: Warning message:
> between() called on numeric vector with S3 class
> Backtrace:
> x
> 1. +-[ `%>%`(...) ]
> 2. +-[ dplyr::collect(...) ]
> 3. +-[ dplyr::filter(...) ]
> 4. \-arrow:::filter.arrow_dplyr_query(...){code}
> But even my simple implementation works fine:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between_mock(fare_amount, 10, 11)) %>%
> collect() {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)