AdamGS opened a new pull request, #20234:
URL: https://github.com/apache/datafusion/pull/20234
## Which issue does this PR close?
- Closes #20134 .
## Rationale for this change
The check for simplifying const expressions was recursive and expensive,
repeatedly checking the expression's children in a recursive way. I've tried
other approached like pre-computing the result for all expressions outside of
the loop and using that cache during the traversal, but I've found that it only
yielded between 5-8% improvement while adding complexity, while this approach
simplifies the code and has shown to be more performant in my benchmarks
(change is compared to current main branch):
```
tpc-ds/q76/cs/16 time: [27.112 µs 27.159 µs 27.214 µs]
change: [−13.533% −13.167% −12.801%] (p = 0.00 <
0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
tpc-ds/q76/ws/16 time: [26.175 µs 26.280 µs 26.394 µs]
change: [−14.312% −13.833% −13.346%] (p = 0.00 <
0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
tpc-ds/q76/cs/128 time: [195.79 µs 196.17 µs 196.56 µs]
change: [−14.362% −14.080% −13.816%] (p = 0.00 <
0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
tpc-ds/q76/ws/128 time: [197.08 µs 197.61 µs 198.23 µs]
change: [−13.531% −13.142% −12.737%] (p = 0.00 <
0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
```
## What changes are included in this PR?
1. `simplify_const_expr` now only checks itself and whether all of its
children are literals, because it assumes the order of simplification is
bottoms-up.
2. Removes some code from the public API, see the last section for the full
details.
## Are these changes tested?
Existing test suite
## Are there any user-facing changes?
I suggest removing some of the physical expression simplification code from
the public API, which I believe reduces the maintenance burden here. These
changes also helps removing code like the distinct `simplify_const_expr` and
`simplify_const_expr_with_dummy`.
1. Makes all `datafusion-physical-expr::simplifier` sub-modules (`not` and
`const_evaluator`) private, including their key functions. They are not used
externally, and being able to change their behavior seems more valuable long
term. The simplifier is also not currently an extension point as far as I can
tell, so there's no value in providing atomic building blocks like them for now.
2. Removes `has_column_references` completely, its trivial to re-implement
and isn't used anywhere in the codebase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]