eejbyfeldt opened a new issue, #13060:
URL: https://github.com/apache/datafusion/issues/13060

   ### Describe the bug
   
   Currently we do not consider the volatility of expressions in 
SimplifyExpressions. This leads us to doing rewrites that might change the 
results and lead to unexpected behavior.
   
   ### To Reproduce
   
   Consider the following query:
   ```
   > explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 
AND random() = 0);
   +---------------+---------------------------------------------+
   | plan_type     | plan                                        |
   +---------------+---------------------------------------------+
   | logical_plan  | Filter: random() = Float64(0)               |
   |               |   Values: (Int64(1)), (Int64(2))            |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
   |               |   FilterExec: random() = 0                  |
   |               |     ValuesExec                              |
   |               |                                             |
   +---------------+---------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.013 seconds.
   ```
   The predicate get simplified into `random() = 0`
   
   ### Expected behavior
   
   The predicate should not be simplified so we deduplicat the volatile 
expressions. 
   ```
   > explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2 
AND random() = 0);
   
+---------------+----------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                      |
   
+---------------+----------------------------------------------------------------------------------+
   | logical_plan  | Filter: random() = Float64(0)  OR column1 = Int64(2) AND  
random() = Float64(0)  |
   |               |   Values: (Int64(1)), (Int64(2))                           
                      |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                      |
   |               |   FilterExec: random() = 0                                 
                      |
   |               |     ValuesExec                                             
                      |
   |               |                                                            
                      |
   
+---------------+----------------------------------------------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.013 seconds.
   random() = CAST(Int64(0) AS Float64) OR column1 = Int64(2) AND random() = 
CAST(Int64(0) AS Float64)
   ```
   
   ### Additional context
   
   We can not exclude volatile expressions outright from simplification as we 
would still like the simplify for example following predicate
   ```
   > explain select * from VALUES (1), (2) where column1 = 2 OR (column1 = 2 
AND random() = 0);
   +---------------+---------------------------------------------+
   | plan_type     | plan                                        |
   +---------------+---------------------------------------------+
   | logical_plan  | Filter: column1 = Int64(2)                  |
   |               |   Values: (Int64(1)), (Int64(2))            |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
   |               |   FilterExec: column1@0 = 2                 |
   |               |     ValuesExec                              |
   |               |                                             |
   +---------------+---------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.015 seconds.
   ```
   As it does not change the result.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to