alamb opened a new issue, #8688:
URL: https://github.com/apache/arrow-datafusion/issues/8688
### Is your feature request related to a problem or challenge?
`SELECT .. WHERE NULL IN (1,2,3)` and `SELECT ... WHERE x in (NULL, 2, 3)`
are always `NULL` (and thus will filter out all rows). However, DataFusion will
still try and evaluate a predicate:
```sql
DataFusion CLI v34.0.0
❯ create table t(x int) as values (1), (2), (3);
0 rows in set. Query took 0.003 seconds.
```
Note there is a `FilterExec` with a non trivial expression in both of the
following queries:
```sql
❯ explain select x from t where x IN (null, 2, 3);
+---------------+---------------------------------------------------------------+
| plan_type | plan
|
+---------------+---------------------------------------------------------------+
| logical_plan | Filter: t.x = Int32(NULL) OR t.x = Int32(2) OR t.x =
Int32(3) |
| | TableScan: t projection=[x]
|
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | FilterExec: x@0 = NULL OR x@0 = 2 OR x@0 = 3
|
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| |
|
+---------------+---------------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.
❯ explain select x from t where null IN (x, 2, 3);
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Filter: Int64(NULL) IN ([CAST(t.x AS Int64), Int64(2),
Int64(3)])
|
| | TableScan: t projection=[x]
|
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | FilterExec: NULL IN ([CastExpr { expr: Column { name:
"x", index: 0 }, cast_type: Int64, cast_options: CastOptions { safe: false,
format_options: FormatOptions { safe: true, null: "", date_format: None,
datetime_format: None, timestamp_format: None, timestamp_tz_format: None,
time_format: None, duration_format: Pretty } } }, Literal { value: Int64(2) },
Literal { value: Int64(3) }]) |
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| |
|
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.004 seconds.
```
In both cases the predicate could have been reduced to a single NULL
### Describe the solution you'd like
I would like to extend the ExprSimplifier rules to handle the case of `NULL
IN (...)` and when the InList contains `NULL`
Here are some similar rules
https://github.com/apache/arrow-datafusion/blob/cc3042a6343457036770267f921bb3b6e726956c/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L474-L549
### Describe alternatives you've considered
_No response_
### Additional context
These types of expressions are sometimes generated programmatically during
rewrites in IOx.
It also came up with discussions with @yahoNanJing on
https://github.com/apache/arrow-datafusion/pull/8669
I think this would be a good first issue as the patterns exist already and
the need is well defined.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]