eejbyfeldt opened a new issue, #13060: URL: https://github.com/apache/datafusion/issues/13060
### Describe the bug Currently we do not consider the volatility of expressions in SimplifyExpressions. This leads us to doing rewrites that might change the results and lead to unexpected behavior. ### To Reproduce Consider the following query: ``` > explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 AND random() = 0); +---------------+---------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------+ | logical_plan | Filter: random() = Float64(0) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: random() = 0 | | | ValuesExec | | | | +---------------+---------------------------------------------+ 2 row(s) fetched. Elapsed 0.013 seconds. ``` The predicate get simplified into `random() = 0` ### Expected behavior The predicate should not be simplified so we deduplicat the volatile expressions. ``` > explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 AND random() = 0); +---------------+----------------------------------------------------------------------------------+ | plan_type | plan | +---------------+----------------------------------------------------------------------------------+ | logical_plan | Filter: random() = Float64(0) OR column1 = Int64(2) AND random() = Float64(0) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: random() = 0 | | | ValuesExec | | | | +---------------+----------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.013 seconds. random() = CAST(Int64(0) AS Float64) OR column1 = Int64(2) AND random() = CAST(Int64(0) AS Float64) ``` ### Additional context We can not exclude volatile expressions outright from simplification as we would still like the simplify for example following predicate ``` > explain select * from VALUES (1), (2) where column1 = 2 OR (column1 = 2 AND random() = 0); +---------------+---------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------+ | logical_plan | Filter: column1 = Int64(2) | | | Values: (Int64(1)), (Int64(2)) | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 = 2 | | | ValuesExec | | | | +---------------+---------------------------------------------+ 2 row(s) fetched. Elapsed 0.015 seconds. ``` As it does not change the result. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org