[ 
https://issues.apache.org/jira/browse/ARROW-14324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436045#comment-17436045
 ] 

Sam Albers commented on ARROW-14324:
------------------------------------

Right. I should have included that one too. That example does not work for me 
either: 
{code:java}
open_dataset(tf, schema = tf_reg) %>%
   filter(stations == "41") %>%
   collect()
Error: NotImplemented: Function equal has no kernel matching input types 
(scalar[string], scalar[int32])
Backtrace:
    x
 1. +-[ `%>%`(...) ]
 2. +-[ dplyr::collect(...) ]
 3. \-arrow:::collect.arrow_dplyr_query(.)
 4.   \-Scanner$create(x)$ToTable()
 5.     \-arrow:::dataset___Scanner__ToTable(self)

packageVersion("arrow")
[1] '5.0.0.2'
{code}

> [R] Inconsistent application of type in Datasets via the schema
> ---------------------------------------------------------------
>
>                 Key: ARROW-14324
>                 URL: https://issues.apache.org/jira/browse/ARROW-14324
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 5.0.0
>            Reporter: Sam Albers
>            Priority: Major
>
>  
> It looks like at least {{filter}} is not handling a column type specified by 
> {{schema }}when specified in {{open_dataset. }}Reprex:
> {code:java}
> options("max.print" = 5)
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> ## Set up the data
> tf <- tempfile()
> dir.create(tf)
> write_dataset(quakes, tf)
> ## Works as expected
> open_dataset(tf) %>%
>  filter(stations == 41) %>%
>  collect()
> #> lat long depth mag stations
> #> 1 -20.42 181.62 562 4.8 41
> #> [ reached 'max' / getOption("max.print") -- omitted 11 rows ]
> ## errors as expected
> open_dataset(tf) %>%
>  filter(stations == "41") %>%
>  collect()
> #> Error: NotImplemented: Function equal has no kernel matching input types 
> (array[int32], scalar[string])
> ## Ok let's change a column type
> tf_reg <- open_dataset(tf)$schema
> tf_reg$stations <- string()
> ## ok returns a character
> open_dataset(tf, schema = tf_reg) %>%
>  pull(stations) %>%
>  typeof()
> #> [1] "character"
> ## So if `stations` is character I think this should work?
> open_dataset(tf, schema = tf_reg) %>%
>  filter(stations == as.character("41")) %>%
>  collect()
> #> Error: Filter expression not supported for Arrow Datasets: stations == 
> as.character("41")
> #> Call collect() first to pull data into R.
> ## previous behaviour no longer works
> open_dataset(tf, schema = tf_reg) %>%
>  filter(stations == 41) %>%
>  collect()
> #> Error: NotImplemented: Function equal has no kernel matching input types 
> (array[string], scalar[double])
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to