kosiew opened a new pull request, #20664:
URL: https://github.com/apache/datafusion/pull/20664
## Which issue does this PR close?
* Part of #20002.
## Rationale for this change
The `PushDownFilter` optimizer rule shows a severe planner-time performance
pathology in the `sql_planner_extended` benchmark, where profiling indicates it
dominates total planning CPU time and repeatedly recomputes expression types.
This PR adds a deterministic, CASE-heavy LEFT JOIN benchmark to reliably
reproduce the worst-case behavior and introduces lightweight debug-only timing
+ counters inside `push_down_filter` to make it easier to pinpoint expensive
sub-sections (e.g. predicate simplification and join predicate inference)
during profiling.
## What changes are included in this PR?
* **Benchmark: add a deterministic CASE-heavy LEFT JOIN workload**
* Adds `build_case_heavy_left_join_query` and helpers to construct a
CASE-nested predicate chain over a `LEFT JOIN`.
* Adds a new benchmark `logical_plan_optimize_case_heavy_left_join` to
stress planning/optimization time.
* Adds an A/B benchmark group `push_down_filter_case_heavy_left_join_ab`
that sweeps predicate counts and CASE depth, comparing:
* default optimizer with `push_down_filter` enabled
* optimizer with `push_down_filter` removed
* **Optimizer instrumentation (debug-only)**
* Adds a small `with_debug_timing` helper gated by `log_enabled!(Debug)`
to record microsecond timings for specific sections.
* Instruments and logs:
* time spent in `infer_join_predicates`
* time spent in `simplify_predicates`
* counts of parent predicates, `on_filters`, inferred join predicates
* before/after predicate counts for simplification
## Are these changes tested?
* No new unit/integration tests were added because this PR is focused on
**benchmarking and debug-only instrumentation** rather than changing optimizer
semantics.
* Coverage is provided by:
* compiling/running the `sql_planner_extended` benchmark
* validating both benchmark variants (with/without `push_down_filter`)
produce optimized plans without errors
* enabling `RUST_LOG=debug` to confirm timing sections and counters emit
as expected
## Are there any user-facing changes?
* No user-facing behavior changes.
* The optimizer logic is unchanged; only **debug logging** is added (emits
only when `RUST_LOG` enables Debug for the relevant modules).
* Benchmark suite additions only affect developers running benches.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]