AdamGS opened a new pull request, #21216: URL: https://github.com/apache/datafusion/pull/21216
## Which issue does this PR close? - Closes #. ## Rationale for this change Similar to #21128, just trying to shave time off the optimizer. Locally, it improves some sql-planner benchmarks by up to 10% but they seem relatively noisy on my laptop. ## What changes are included in this PR? 1. Avoid allocation `plan.children()` in a loop in `sort_pushdown.rs`. 2. Try and avoid some expensive tree rewrites in `join_selection.rs` 3. Avoid deep clones of exec limit nodes in `limit_pushdown.rs`, and only mutate the plan if it was actually changed. 4. Use cheaper code path to change the limit on an `AggregateExec` in `limited_distinct_aggregation.rs`. 5. Use a read-only traversal in `sanity_checker.rs`. Its read only and `transform_up` is always more expensive. I've considered extending `TreeNode` but this seems to be basically the only place in the codebase that does something like this. There are a few places where we unconditionally return `Transformed::yes` which might unintended downstream consequences because it breaks pointer equality and I think it also just end up allocating more memory, but they are harder to untangle so I'll try and do them in followups. ## Are these changes tested? One new test for limits, otherwise the existing tests. ## Are there any user-facing changes? Removes the `LimitExec` type, I can't imagine why someone would use it, and its only used in one place. Happy to bring it back as a deprecated type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
