peter-toth commented on PR #13177:
URL: https://github.com/apache/datafusion/pull/13177#issuecomment-2452940289

   Thank you @blaginin for improving the PR and pointing me to those 
conversations. Unfortunately I missed those and wasn't aware that there have 
been an attempt to add stack growth libraries.
   
   I followed the commits yesterday and saw how each of them made the 
implementation better and better. I could only suggest minor changes like 
`TransformingState::ProcessingChildren` probably doesn't need to 2 separate 
`vec`s, just an index to track which child nodes have been transformed.
   
   But, on the other hand, I also looked into `stacker` and `recursive`. 
`stacker` seems like a very mature crate that has been used in many projects, 
while `recursive` is just a convenience layer over `stacker` to allow using it 
even simpler.
   
   Honestly, I don't share the concerns about these libraries. `stacker` is 
owned by the rust-lang team and even the Rust compiler itself uses it: 
https://github.com/rust-lang/rust/blob/master/compiler/rustc_data_structures/src/stack.rs#L14-L22.
 `recursive` seems to be created by the Polars guys, and looks like almost only 
they use it.
   
   By running some benchmarks I was referring to running the old vs new APIs on 
large trees with many nodes. IMO the planning benchmarks tell a lot about 
everyday queries, but doesn't contain any extreme cases with very large trees.
   
   I forked your branch yesterday: 
https://github.com/peter-toth/datafusion/commits/recursive-vs-iterative/ (not 
the latest state, but after the visit improvement) and in 3 commits separated 
the old (recursive) and new (iterative) `visit` API implementations + added 
`recursive` to the old + added a small benchmark to just quickly test the old 
vs new `visit` API on tall and wide `Arc<DynTestNode>` trees:
   
   ```
   % cargo bench --bench tree_node
   
   visit old tall tree     time:   [377.79 µs 380.87 µs 385.39 µs]
                           change: [-3.2937% -2.2943% -1.1798%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     8 (8.00%) low mild
     5 (5.00%) high severe
   
   visit new tall tree     time:   [349.62 µs 351.73 µs 353.89 µs]
                           change: [-3.9519% -3.2991% -2.6432%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 21 outliers among 100 measurements (21.00%)
     5 (5.00%) low severe
     2 (2.00%) low mild
     8 (8.00%) high mild
     6 (6.00%) high severe
   
   Benchmarking visit old wide tree: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 86.0s, or reduce sample count to 10.
   visit old wide tree     time:   [875.28 ms 879.02 ms 882.86 ms]
                           change: [-1.2651% -0.7020% -0.0989%] (p = 0.02 < 
0.05)
                           Change within noise threshold.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   Benchmarking visit new wide tree: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 93.1s, or reduce sample count to 10.
   visit new wide tree     time:   [913.97 ms 915.78 ms 917.58 ms]
                           change: [-0.0531% +0.3441% +0.6875%] (p = 0.07 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   ```
   
   It looks like the 2 implementations are comparable. Maybe the recursive 
approach + `stacker` is a bit faster when it comes to large trees (879 vs 915 
ms). (And as we discussed most likely `Expr` is where the iterative approach 
would suffer...)
   
   But overall, I feel I'm still leaning tovards `stacker` for a few reasons:
   - On recursive data structures the recursive algorithms are natural, I find 
it much easier to follow the old API implementation than the new one.
   - I have no concerns with `stacker` / `recursive` and they seem to provide a 
general way to deal with deeply nested recursive calls. Also, these crates can 
be useful in other areas than `TreeNode`s.
   - The `apply` / `visit` / `transform` / `rewrite` APIs are the most common 
ones to process trees, but let's not forget about rules that self organize 
their tree recursion via `apply_children` / `map_children` in a recursive 
manner. I do like that we provide these lower level APIs for unique usecases 
while they can keep the natural recursive implementations.
   - As we discussed there are trees like `Expr` where the iterative approach 
would be much clumsier. Sure, we could a deconstruct a node and store its 
reconsturction closure on the stack and then reconstruct it once we have 
transformed their children, but I feel the current `map_children` is so much 
cleaner in this case.
   - There could be extreme cases with `TreeNode`s as well. I'm thinking of 
`_with_subqueries()` methods of `LogicalPlan` in which we switch between 2 
kinds of trees (`LogicalPlan` / `Expr`) as many times as needed. So stack 
overflow can happen due to multiple different trees are nested into each other 
and this can be trivially handled using `stacker`, but you would need to store 
multiple kinds of `TreeNode`s on the stack, which is again not that 
straightforward.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to