jhorstmann commented on pull request #9038:
URL: https://github.com/apache/arrow/pull/9038#issuecomment-753397568
Semantics for the `IN` and `OR` versions should be the same *in SQL*, for
example [try the following query in postgres][1]:
```
SELECT 'abc' in ('abc', 'def')
, 'abc' in ('abc', 'def', null)
, 'abc' in ('def', null)
, null in ('abc', 'def')
-- same expressions rewritten using OR
, (('abc' = 'abc') OR ('abc' = 'def'))
, (('abc' = 'abc') OR ('abc' = 'def') or ('abc' = null))
, (('abc' = 'def') or ('abc' = null))
, ((null = 'abc') OR (null = 'def'))
```
1. result is `true`, no nulls involved
2. result is `true`, additional `null` on the rhs does not change this
since `(true OR null) = true`
3. result is `null` since `(false OR null) IS NULL`
4. result is `null` since lhs of each comparison is null
But you are right, it's not that simple since the arrow `or` / `and` kernels
currently do not follow this sql behaviour. We might want to change that or
rather introduce separate boolean kernels with the sql behaviour regarding
nulls.
[1]: http://sqlfiddle.com/#!17/9eecb/67863
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]