nealrichardson commented on issue #43440:
URL: https://github.com/apache/arrow/issues/43440#issuecomment-2254226625

   I think I see what's going on. The binding for `%in%` calls a function 
`cast_or_parse()`, which tries to align the type of the input to match the type 
of the array. This is casting `"a"` to a dictionary type to match:
   
   ```
   arrow_table(x = factor(c("a", "b", "c"))) |>
     filter(x %in% "a")
   
   Table (query)
   x: dictionary<values=string, indices=int8>
   
   * Filter: is_in(x, {value_set=dictionary<values=string, indices=int8, 
ordered=0>:
   -- dictionary:
     [
       "a"
     ]
   -- indices:
     [
       0
     ], null_matching_behavior=SKIP})
   See $.data for the source Arrow object
   ```
   
   But that apparently isn't what the function behind `%in%` wants. If I 
directly build the Arrow expression so that I can avoid that casting, it works:
   
   ```
   arrow_table(x = factor(c("a", "b", "c"))) |> 
     filter(arrow_is_in(x, options = list(value_set = Array$create("a"), 
skip_nulls = TRUE))) |>
     collect()
   
   # A tibble: 1 × 1
     x    
     <fct>
   1 a    
   ```
   
   I can put together a PR to fix that, though it does make me wonder how 
widespread this issue is. I don't think we have great test coverage for 
dictionary columns in Acero.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to