peter-toth commented on PR #11683:
URL: https://github.com/apache/datafusion/pull/11683#issuecomment-2271363280

   I believe I understood what happened here.
   In https://github.com/apache/datafusion/pull/10835 the 
`CommonSubexprEliminate` rule [was converted to a `ApplyOrder::TopDown` 
rule](https://github.com/apache/datafusion/pull/10835/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R577-R579),
 but also [the `self.rewrite()` call was 
kept](https://github.com/apache/datafusion/pull/10835/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R195)
 and caused a weird plan traversal where some of the nodes are traversed 2 
times.
   
   The main problem with the top-down conversion is that in some cases 
`CommonSubexprEliminate` collects adjacent nodes (e.g. `Window`) into groups 
and once we eliminated subexpression among those groups we transform back the 
result into adjacent nodes. This kind of transformations can't be definied with 
a simple top-down optimizer rule. Most likely this is why the rule handled 
recursion itself before https://github.com/apache/datafusion/pull/10835.
   
   Then I added `test_non_top_level_common_expression` test in this PR  and 
noticed that a non root project node was not CSEd. But I made a mistake as the 
unit tests in `common_subexpr_eliminate.rs` don't actually use an `Optimizer` 
(configured to use `CommonSubexprEliminate`) but just invoke 
`CommonSubexprEliminate::rewrite()` directly. So despite the 
`CommonSubexprEliminate` was top-down, the test couldn't succeed.
   
   So I think the right fix is to:
   - Let the rule handle recursion (i.e. change `apply_order()` return `None`) 
to avoid double recursion.
   - Change `assert_optimized_plan_eq` to use an `Optimizer` to be able to test 
cases like `test_non_top_level_common_expression`.
   
   I've rebased the previous commit on the latest `main` and added a 2nd 
[commit](https://github.com/apache/datafusion/pull/11683/commits/6a6281120ff29952ce5b34656ff8c3645e863d8e)
 with the above 2 points.
   
   I ran some benchmarks and this PR is now sometimes better than `main`:
   ```
   % critcmp main make-cse-top-down-like
   group                                         main                           
         make-cse-top-down-like
   -----                                         ----                           
         ----------------------
   logical_aggregate_with_join                   1.01   565.4±14.92µs        ? 
?/sec     1.00    561.0±9.20µs        ? ?/sec
   logical_plan_tpcds_all                        1.03     77.5±2.96ms        ? 
?/sec     1.00     75.2±1.10ms        ? ?/sec
   logical_plan_tpch_all                         1.00      7.7±0.09ms        ? 
?/sec     1.00      7.7±0.08ms        ? ?/sec
   logical_select_all_from_1000                  1.00     15.0±0.26ms        ? 
?/sec     1.01     15.2±0.18ms        ? ?/sec
   logical_select_one_from_700                   1.00   405.6±10.82µs        ? 
?/sec     1.00   404.3±10.31µs        ? ?/sec
   logical_trivial_join_high_numbered_columns    1.01    402.7±5.89µs        ? 
?/sec     1.00    399.7±6.28µs        ? ?/sec
   logical_trivial_join_low_numbered_columns     1.01    383.7±7.82µs        ? 
?/sec     1.00    378.9±5.03µs        ? ?/sec
   physical_plan_tpcds_all                       1.09  590.3±107.45ms        ? 
?/sec     1.00    543.2±3.93ms        ? ?/sec
   physical_plan_tpch_all                        1.01     34.1±0.47ms        ? 
?/sec     1.00     33.9±0.58ms        ? ?/sec
   physical_plan_tpch_q1                         1.00  1105.2±12.43µs        ? 
?/sec     1.00  1110.3±23.70µs        ? ?/sec
   physical_plan_tpch_q10                        1.01  1550.2±38.57µs        ? 
?/sec     1.00  1537.9±19.03µs        ? ?/sec
   physical_plan_tpch_q11                        1.01  1328.8±16.32µs        ? 
?/sec     1.00  1313.9±17.56µs        ? ?/sec
   physical_plan_tpch_q12                        1.01  1108.5±18.29µs        ? 
?/sec     1.00  1096.9±16.36µs        ? ?/sec
   physical_plan_tpch_q13                        1.00   755.8±10.98µs        ? 
?/sec     1.00   752.3±10.62µs        ? ?/sec
   physical_plan_tpch_q14                        1.04   942.4±73.52µs        ? 
?/sec     1.00   908.6±11.28µs        ? ?/sec
   physical_plan_tpch_q16                        1.01  1345.2±26.82µs        ? 
?/sec     1.00  1334.1±34.09µs        ? ?/sec
   physical_plan_tpch_q17                        1.01  1200.3±50.67µs        ? 
?/sec     1.00  1188.0±22.75µs        ? ?/sec
   physical_plan_tpch_q18                        1.02  1433.4±102.83µs        ? 
?/sec    1.00  1402.0±15.42µs        ? ?/sec
   physical_plan_tpch_q19                        1.02      2.6±0.20ms        ? 
?/sec     1.00      2.6±0.17ms        ? ?/sec
   physical_plan_tpch_q2                         1.00      2.9±0.04ms        ? 
?/sec     1.00      2.9±0.04ms        ? ?/sec
   physical_plan_tpch_q20                        1.04  1673.2±79.81µs        ? 
?/sec     1.00  1616.4±20.52µs        ? ?/sec
   physical_plan_tpch_q21                        1.04      2.4±0.10ms        ? 
?/sec     1.00      2.3±0.03ms        ? ?/sec
   physical_plan_tpch_q22                        1.02  1165.8±27.17µs        ? 
?/sec     1.00  1147.1±35.94µs        ? ?/sec
   physical_plan_tpch_q3                         1.00  1102.3±16.77µs        ? 
?/sec     1.00  1099.9±12.63µs        ? ?/sec
   physical_plan_tpch_q4                         1.00    846.3±8.77µs        ? 
?/sec     1.00    844.0±9.90µs        ? ?/sec
   physical_plan_tpch_q5                         1.01  1661.4±23.93µs        ? 
?/sec     1.00  1641.8±19.28µs        ? ?/sec
   physical_plan_tpch_q6                         1.00    584.7±9.64µs        ? 
?/sec     1.00   583.2±20.63µs        ? ?/sec
   physical_plan_tpch_q7                         1.00      2.1±0.03ms        ? 
?/sec     1.00      2.1±0.21ms        ? ?/sec
   physical_plan_tpch_q8                         1.01      2.7±0.04ms        ? 
?/sec     1.00      2.6±0.04ms        ? ?/sec
   physical_plan_tpch_q9                         1.01  1932.5±23.29µs        ? 
?/sec     1.00  1906.6±21.08µs        ? ?/sec
   physical_select_all_from_1000                 1.00     35.2±1.20ms        ? 
?/sec     1.01     35.5±0.48ms        ? ?/sec
   physical_select_one_from_700                  1.00  1781.3±59.97µs        ? 
?/sec     1.01  1805.4±43.25µs        ? ?/sec
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to