alamb opened a new issue, #9678: URL: https://github.com/apache/arrow-datafusion/issues/9678
### Describe the bug While updating DataFusion in InfluxDB we found what appears to be a bug in common_subexpr_eliminate that causes incorrect answers that seems to have come in via the `TreeNode` refactor in https://github.com/apache/arrow-datafusion/pull/8891 ### To Reproduce ```sql > create table m(f64 double) as values (10.1); 0 rows in set. Query took 0.038 seconds. -- This query is correct (note that acos is 1.471623942988976) ❯ select f64, coalesce(1.0 / f64, 0.0), acos(coalesce(1.0 / f64, 0.0)) from m; +------+-----------------------------------------+-----------------------------------------------+ | f64 | coalesce(Float64(1) / m.f64,Float64(0)) | acos(coalesce(Float64(1) / m.f64,Float64(0))) | +------+-----------------------------------------+-----------------------------------------------+ | 10.1 | 0.09900990099009901 | 1.471623942988976 | +------+-----------------------------------------+-----------------------------------------------+ 1 row in set. Query took 0.001 seconds. -- However, when I add an alias `as f64_1` to the second argument the query is now incorrect (the acos ...) term now is `0.09900990099009901 which is the same as 1/10.1`) ❯ select f64, coalesce(1.0 / f64, 0.0) as f64_1, acos(coalesce(1.0 / f64, 0.0)) from m; +------+---------------------+-----------------------------------------------+ | f64 | f64_1 | acos(coalesce(Float64(1) / m.f64,Float64(0))) | +------+---------------------+-----------------------------------------------+ | 10.1 | 0.09900990099009901 | 0.09900990099009901 | +------+---------------------+-----------------------------------------------+ 1 row in set. Query took 0.002 seconds. ``` ### Expected behavior The correct answer is: ```sql ❯ select f64, coalesce(1.0 / f64, 0.0) as f64_1, acos(coalesce(1.0 / f64, 0.0)) from m; +------+---------------------+-----------------------------------------------+ | f64 | f64_1 | acos(coalesce(Float64(1) / m.f64,Float64(0))) | +------+---------------------+-----------------------------------------------+ | 10.1 | 0.09900990099009901 | 1.471623942988976 | +------+---------------------+-----------------------------------------------+ 1 row in set. Query took 0.008 seconds. ``` ### Additional context 1. `coalesce` seems to be required to trigger the bug. Removing it from the query results in ```sql ❯ select f64, 1.0/f64 as f64_1, acos(1.0 / f64) from m; +------+---------------------+--------------------------+ | f64 | f64_1 | acos(Float64(1) / m.f64) | +------+---------------------+--------------------------+ | 10.1 | 0.09900990099009901 | 1.471623942988976 | +------+---------------------+--------------------------+ 1 row in set. Query took 0.003 seconds. ``` 2. It seems to have come in via https://github.com/apache/arrow-datafusion/pull/8891 The bug happens in a84e5f89bd52d59c78f11fffbab89ed1d418538f, but does not happen in 684b4fab75841672aabc4f3c2475e8c232acea20 (the commit right before https://github.com/apache/arrow-datafusion/pull/8891) ```sql ❯ select f64, coalesce(1.0 / f64, 0.0) as f64_1, acos(coalesce(1.0 / f64, 0.0)) from m; +------+---------------------+-----------------------------------------------+ | f64 | f64_1 | acos(coalesce(Float64(1) / m.f64,Float64(0))) | +------+---------------------+-----------------------------------------------+ | 10.1 | 0.09900990099009901 | 1.471623942988976 | +------+---------------------+-----------------------------------------------+ 1 row in set. Query took 0.013 seconds. ``` 3. It happens for other functions (like `cos`) even though we only saw it for `acos` in IOx I tracked the issue down to a problem in common_subexpr_eliminate. You can see the issue via `explain verbose`: ```sql ❯ explain verbose select f64, coalesce(1.0 / f64, 0.0) as f64_1, acos(coalesce(1.0 / f64, 0.0)) from m; +------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | initial_logical_plan | Projection: m.f64, coalesce(Float64(1) / m.f64, Float64(0)) AS f64_1, acos(coalesce(Float64(1) / m.f64, Float64(0))) | | | TableScan: m | ... | | logical_plan after eliminate_cross_join | SAME TEXT AS ABOVE | | logical_plan after common_sub_expression_eliminate | Projection: m.f64, coalesce(Float64(1) / m.f64, Float64(0)) AS f64_1, coalesce(Float64(1) / m.f64, Float64(0)) AS acos(coalesce(Float64(1) / m.f64,Float64(0))) | | | Projection: coalesce(Float64(1) / m.f64, Float64(0)) AS coalesce(Float64(1) / m.f64, Float64(0)), m.f64 | | | TableScan: m | | logical_plan after eliminate_limit | SAME TEXT AS ABOVE | | logical_plan after propagate_empty_relation | SAME TEXT AS ABOVE | | logical_plan after eliminate_one_union | SAME TEXT AS ABOVE ``` You can see `coalesce(Float64(1) / m.f64, Float64(0)) AS acos(coalesce(Float64(1) / m.f64,Float64(0)))` shows clearly the `acos` was removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
