PalGal2 opened a new issue #8454:
URL: https://github.com/apache/arrow/issues/8454


   Hi,
   
   Some expressions, such as substr(), grepl(), str_detect() or others, are not 
supported while filtering after open_datatset(). Specifically, the code below :
   
   ```
   library(dplyr)
   library(arrow)
   data = data.frame(a = c("a", "a2", "a3"))
   write_parquet(data, "Test_filter/data.parquet")
   
   ds <- open_dataset("Test_filter/")
   
   data_flt <- ds %>% 
     filter(substr(a, 1, 1) == "a")
   ```
   
   gives this error :
   
   ```
   Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) 
== "a"
   Call collect() first to pull data into R.
   ```
   These expressions may be very helpful, not to say necessary, to filter and 
collect a very large dataset. Is there anything it can be done to implement 
this new feature ?
   
   Thank you. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to