alamb opened a new issue, #8687: URL: https://github.com/apache/arrow-datafusion/issues/8687
### Is your feature request related to a problem or challenge? @my-vegetable-has-exploded and I had an observation on https://github.com/apache/arrow-datafusion/pull/8654#discussion_r1437009620 where there are certain patterns that can never be true. For example, `WHERE x IN (1,2,3) AND x IN (4,5)` can never be true because the `IN` lists are disjoint. However, DataFusion can't prove this today and will evaluate that expression. So for example, given ```sql DataFusion CLI v34.0.0 ❯ create table t(x int) as values (1), (2), (3); 0 rows in set. Query took 0.003 seconds. ``` This query: ``` ❯ explain select x from t where x IN (1,2,3) AND x IN (4,5); +---------------+-----------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-----------------------------------------------------------------------------------------------------+ | logical_plan | Filter: (t.x = Int32(1) OR t.x = Int32(2) OR t.x = Int32(3)) AND (t.x = Int32(4) OR t.x = Int32(5)) | | | TableScan: t projection=[x] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: (x@0 = 1 OR x@0 = 2 OR x@0 = 3) AND (x@0 = 4 OR x@0 = 5) | < ---- this filter still runs even though it will never be true | | MemoryExec: partitions=1, partition_sizes=[1] | | | | +---------------+-----------------------------------------------------------------------------------------------------+ 2 rows in set. Query took 0.003 seconds. ``` For other predicates that datafusion knows can not be true, the query is greatly simplified: ``` ❯ explain select x from t where false; +---------------+---------------+ | plan_type | plan | +---------------+---------------+ | logical_plan | EmptyRelation | | physical_plan | EmptyExec | | | | +---------------+---------------+ ``` ### Describe the solution you'd like It might be nice to extend the [ExprSimplifier](https://docs.rs/datafusion/latest/datafusion/optimizer/simplify_expressions/struct.ExprSimplifier.html#) for logical expr simplification to handle these cases **Impact**: I don't know how prevalent such "obviously false" expressions are in actual work loads, but my sense is if it is straightforward to prove the expression can never be true we should use that information while executing queries. ### Describe alternatives you've considered _No response_ ### Additional context The code for LiteralGuarantees is related but works on PhysicalExpressions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
