[
https://issues.apache.org/jira/browse/ARROW-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson updated ARROW-10305:
------------------------------------
Component/s: (was: C++)
> [R] Filter with regular expressions
> -----------------------------------
>
> Key: ARROW-10305
> URL: https://issues.apache.org/jira/browse/ARROW-10305
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Pal
> Priority: Major
> Fix For: 4.0.0
>
>
> Hi,
> Some expressions, such as substr(), grepl(), str_detect() or others, are not
> supported while filtering a dataset (after open_datatset() ). Specifically,
> the code below :
> {code:java}
> library(dplyr)
> library(arrow)
> data = data.frame(a = c("a", "a2", "a3"))
> write_parquet(data, "Test_filter/data.parquet")
> ds <- open_dataset("Test_filter/")
> data_flt <- ds %>%
> filter(substr(a, 1, 1) == "a")
> {code}
> gives this error :
> {code:java}
> Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) ==
> "a"
> Call collect() first to pull data into R.{code}
> These expressions may be very helpful, not to say necessary, to filter and
> collect a very large dataset. Is there anything it can be done to implement
> this new feature ?
> Thank you.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)