alamb opened a new issue, #8687:
URL: https://github.com/apache/arrow-datafusion/issues/8687

   ### Is your feature request related to a problem or challenge?
   
   @my-vegetable-has-exploded and I had an observation on 
https://github.com/apache/arrow-datafusion/pull/8654#discussion_r1437009620 
where there are certain patterns that can never be true.
   
   For example, `WHERE x IN (1,2,3) AND x IN (4,5)` can never be true because 
the `IN` lists are disjoint. However, DataFusion can't prove this today and 
will evaluate that expression.
   
   So for example, given
   ```sql
   DataFusion CLI v34.0.0
   ❯ create table t(x int) as values (1), (2), (3);
   0 rows in set. Query took 0.003 seconds.
   ```
   
   This query:
   ```
   ❯ explain select x from t where x IN (1,2,3) AND x IN (4,5);
   
+---------------+-----------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                         |
   
+---------------+-----------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: (t.x = Int32(1) OR t.x = Int32(2) OR t.x = 
Int32(3)) AND (t.x = Int32(4) OR t.x = Int32(5)) |
   |               |   TableScan: t projection=[x]                              
                                         |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                         |
   |               |   FilterExec: (x@0 = 1 OR x@0 = 2 OR x@0 = 3) AND (x@0 = 4 
OR x@0 = 5)                              | < ---- this filter still runs even 
though it will never be true
   |               |     MemoryExec: partitions=1, partition_sizes=[1]          
                                         |
   |               |                                                            
                                         |
   
+---------------+-----------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.003 seconds.
   ```
   
   For other predicates that datafusion knows can not be true, the query is 
greatly simplified:
   
   ```
   ❯ explain select x from t where false;
   +---------------+---------------+
   | plan_type     | plan          |
   +---------------+---------------+
   | logical_plan  | EmptyRelation |
   | physical_plan | EmptyExec     |
   |               |               |
   +---------------+---------------+
   ```
   
   
   
   
   ### Describe the solution you'd like
   
   It might be nice to extend the  
[ExprSimplifier](https://docs.rs/datafusion/latest/datafusion/optimizer/simplify_expressions/struct.ExprSimplifier.html#)
 for  logical expr simplification to handle these cases
   
   **Impact**: I don't know how prevalent such "obviously false" expressions 
are in actual work loads, but my sense is if it is straightforward to prove the 
expression can never be true we should use that information while executing 
queries. 
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   The code for LiteralGuarantees is related but works on PhysicalExpressions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to