alamb opened a new issue, #8688:
URL: https://github.com/apache/arrow-datafusion/issues/8688

   ### Is your feature request related to a problem or challenge?
   
   `SELECT .. WHERE NULL IN (1,2,3)` and `SELECT ... WHERE x in (NULL, 2, 3)` 
are always `NULL` (and thus will filter out all rows). However, DataFusion will 
still try and evaluate a predicate:
   
   ```sql
   DataFusion CLI v34.0.0
   ❯ create table t(x int) as values (1), (2), (3);
   0 rows in set. Query took 0.003 seconds.
   ```
   
   Note there is a `FilterExec` with a non trivial expression in both of the 
following queries:
   
   ```sql
   ❯ explain select x from t where x IN (null, 2, 3);
   
+---------------+---------------------------------------------------------------+
   | plan_type     | plan                                                       
   |
   
+---------------+---------------------------------------------------------------+
   | logical_plan  | Filter: t.x = Int32(NULL) OR t.x = Int32(2) OR t.x = 
Int32(3) |
   |               |   TableScan: t projection=[x]                              
   |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
   |
   |               |   FilterExec: x@0 = NULL OR x@0 = 2 OR x@0 = 3             
   |
   |               |     MemoryExec: partitions=1, partition_sizes=[1]          
   |
   |               |                                                            
   |
   
+---------------+---------------------------------------------------------------+
   2 rows in set. Query took 0.002 seconds.
   
   ❯ explain select x from t where null IN (x, 2, 3);
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
         |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: Int64(NULL) IN ([CAST(t.x AS Int64), Int64(2), 
Int64(3)])                                                                      
                                                                                
                                                                                
                                                                                
             |
   |               |   TableScan: t projection=[x]                              
                                                                                
                                                                                
                                                                                
                                                                                
         |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                                                                
                                                                                
                                                                                
                                                                                
         |
   |               |   FilterExec: NULL IN ([CastExpr { expr: Column { name: 
"x", index: 0 }, cast_type: Int64, cast_options: CastOptions { safe: false, 
format_options: FormatOptions { safe: true, null: "", date_format: None, 
datetime_format: None, timestamp_format: None, timestamp_tz_format: None, 
time_format: None, duration_format: Pretty } } }, Literal { value: Int64(2) }, 
Literal { value: Int64(3) }]) |
   |               |     MemoryExec: partitions=1, partition_sizes=[1]          
                                                                                
                                                                                
                                                                                
                                                                                
         |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
         |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   In both cases the predicate could have been reduced to a single NULL
   
   ### Describe the solution you'd like
   
   I would like to extend the ExprSimplifier rules to handle the case of `NULL 
IN (...)` and when the InList contains `NULL`
   
   Here are some similar rules
   
https://github.com/apache/arrow-datafusion/blob/cc3042a6343457036770267f921bb3b6e726956c/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L474-L549
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   These types of expressions are sometimes generated programmatically during 
rewrites in IOx. 
   
   It also came up with discussions with @yahoNanJing  on 
https://github.com/apache/arrow-datafusion/pull/8669
   
   I think this would be a good first issue as the patterns exist already and 
the need is well defined. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to