[ https://issues.apache.org/jira/browse/ARROW-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-10305: ------------------------------------ Fix Version/s: 4.0.0 > [R] Filter with regular expressions > ----------------------------------- > > Key: ARROW-10305 > URL: https://issues.apache.org/jira/browse/ARROW-10305 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, R > Reporter: Pal > Priority: Major > Fix For: 4.0.0 > > > Hi, > Some expressions, such as substr(), grepl(), str_detect() or others, are not > supported while filtering a dataset (after open_datatset() ). Specifically, > the code below : > {code:java} > library(dplyr) > library(arrow) > data = data.frame(a = c("a", "a2", "a3")) > write_parquet(data, "Test_filter/data.parquet") > ds <- open_dataset("Test_filter/") > data_flt <- ds %>% > filter(substr(a, 1, 1) == "a") > {code} > gives this error : > {code:java} > Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == > "a" > Call collect() first to pull data into R.{code} > These expressions may be very helpful, not to say necessary, to filter and > collect a very large dataset. Is there anything it can be done to implement > this new feature ? > Thank you. -- This message was sent by Atlassian Jira (v8.3.4#803005)