Pal created ARROW-10305:
---------------------------

             Summary: [R] Error: Filter expression not supported for Arrow 
Datasets (substr, grepl, str_detect)
                 Key: ARROW-10305
                 URL: https://issues.apache.org/jira/browse/ARROW-10305
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
    Affects Versions: 1.0.1
            Reporter: Pal


Hi,

Some expressions, such as substr(), grepl(), str_detect() or others, are not 
supported while filtering after open_datatset(). Specifically, the code below :

 

{{library(dplyr)
library(arrow)
data = data.frame(a = c("a", "a2", "a3"))
write_parquet(data, "Test_filter/data.parquet")

ds <- open_dataset("Test_filter/")

data_flt <- ds %>% 
  filter(substr(a, 1, 1) == "a")}}

gives this error :

 

{{Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == 
"a"
Call collect() first to pull data into R.}}

These expressions may be very helpful, not to say necessary, to filter and 
collect a very large dataset. Is there anything it can be done to implement 
this new feature ?

Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to