Ted-Jiang commented on issue #8688:
URL: 
https://github.com/apache/arrow-datafusion/issues/8688#issuecomment-1888374334

   > > I would like to take a look at this.
   > 
   > Thank you !
   > 
   > > If my understanding is correct, the first case i.e. `SELECT .. WHERE 
NULL IN (…)` would simplify to `SELECT .. WHERE False`.
   > 
   > I think it actually would simplify to ` SELECT .. WHERE null` (which is 
technically different than `WHERE false` but will have the same effect of 
returning no rows). This is explained a little more in #8689
   > 
   > Using datafusion-cli, you can check it:
   > 
   > ```sql
   > ❯ create table t (x int) as values (1), (2);
   > 0 rows in set. Query took 0.007 seconds.
   > 
   > ❯ select NULL IN (1, 2);
   > +----------------------------------------------------------------------+
   > | NULL IN (Map { iter: Iter([Literal(Int64(1)), Literal(Int64(2))]) }) |
   > +----------------------------------------------------------------------+
   > |                                                                      |
   > +----------------------------------------------------------------------+
   > 1 row in set. Query took 0.002 seconds.
   > ```
   > 
   > > As for the second case, `SELECT .. WHERE x IN (NULL, …)` would be 
simplified to `SELECT .. WHERE x IN (…)` as it could still return rows for the 
other exprs in the list.
   > 
   > I actually tried this out and it seems like this is NOT a correct 
simplification (sorry for my confusion). So maybe we can't simplify such select 
lists 🤔
   > 
   > ```sql
   > DataFusion CLI v34.0.0
   > ❯ create table t (x int) as values (1), (2);
   > 0 rows in set. Query took 0.006 seconds.
   > 
   > ❯ select x IN (NULL, 2) from t;
   > +-----------------------------------------------------------------+
   > | t.x IN (Map { iter: Iter([Literal(NULL), Literal(Int64(2))]) }) |
   > +-----------------------------------------------------------------+
   > |                                                                 |
   > | true                                                            |
   > +-----------------------------------------------------------------+
   > 2 rows in set. Query took 0.002 seconds.
   > 
   > ❯ select x IN (2) from t;
   > +--------------------------------------------------+
   > | t.x IN (Map { iter: Iter([Literal(Int64(2))]) }) |
   > +--------------------------------------------------+
   > | false                                            |
   > | true                                             |
   > +--------------------------------------------------+
   > 2 rows in set. Query took 0.001 seconds.
   > 
   > ❯
   > ```
   > 
   > Also, the same in postgres:
   > 
   > ```sql
   > postgres=# create table t (x int);
   > CREATE TABLE
   > postgres=# insert into t values (1), (2);
   > INSERT 0 2
   > 
   > postgres=# select x in (NULL, 2) from t;
   >  ?column?
   > ----------
   > 
   >  t
   > (2 rows)
   > 
   > postgres=# select x in (2) from t;
   >  ?column?
   > ----------
   >  f
   >  t
   > (2 rows)
   > ```
   
   Thanks for explain this! 👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to