jhorstmann commented on pull request #9038: URL: https://github.com/apache/arrow/pull/9038#issuecomment-753397568
Semantics for the `IN` and `OR` versions should be the same *in SQL*, for example [try the following query in postgres][1]: ``` SELECT 'abc' in ('abc', 'def') , 'abc' in ('abc', 'def', null) , 'abc' in ('def', null) , null in ('abc', 'def') -- same expressions rewritten using OR , (('abc' = 'abc') OR ('abc' = 'def')) , (('abc' = 'abc') OR ('abc' = 'def') or ('abc' = null)) , (('abc' = 'def') or ('abc' = null)) , ((null = 'abc') OR (null = 'def')) ``` 1. result is `true`, no nulls involved 2. result is `true`, additional `null` on the rhs does not change this since `(true OR null) = true` 3. result is `null` since `(false OR null) IS NULL` 4. result is `null` since lhs of each comparison is null But you are right, it's not that simple since the arrow `or` / `and` kernels currently do not follow this sql behaviour. We might want to change that or rather introduce separate boolean kernels with the sql behaviour regarding nulls. [1]: http://sqlfiddle.com/#!17/9eecb/67863 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org