This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new e80694e369 Remove recursive const check in `simplify_const_expr`
(#20234)
e80694e369 is described below
commit e80694e36956a29d24c55ffcff83323b54ec0ca6
Author: Adam Gutglick <[email protected]>
AuthorDate: Tue Feb 24 19:49:37 2026 +0000
Remove recursive const check in `simplify_const_expr` (#20234)
## Which issue does this PR close?
- Closes #20134 .
## Rationale for this change
The check for simplifying const expressions was recursive and expensive,
repeatedly checking the expression's children in a recursive way.
I've tried other approached like pre-computing the result for all
expressions outside of the loop and using that cache during the
traversal, but I've found that it only yielded between 5-8% improvement
while adding complexity, while this approach simplifies the code and
seems to be more performant in my benchmarks (change is compared to
current main branch):
```
tpc-ds/q76/cs/16 time: [27.112 µs 27.159 µs 27.214 µs]
change: [−13.533% −13.167% −12.801%] (p = 0.00 <
0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
tpc-ds/q76/ws/16 time: [26.175 µs 26.280 µs 26.394 µs]
change: [−14.312% −13.833% −13.346%] (p = 0.00 <
0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
tpc-ds/q76/cs/128 time: [195.79 µs 196.17 µs 196.56 µs]
change: [−14.362% −14.080% −13.816%] (p = 0.00 <
0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
tpc-ds/q76/ws/128 time: [197.08 µs 197.61 µs 198.23 µs]
change: [−13.531% −13.142% −12.737%] (p = 0.00 <
0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
```
## What changes are included in this PR?
1. `simplify_const_expr` now only checks itself and whether all of its
children are literals, because it assumes the order of simplification is
bottoms-up.
2. Removes some code from the public API, see the last section for the
full details.
## Are these changes tested?
Existing test suite
## Are there any user-facing changes?
I suggest removing some of the physical expression simplification code
from the public API, which I believe reduces the maintenance burden
here. These changes also helps removing code like the distinct
`simplify_const_expr` and `simplify_const_expr_with_dummy`.
1. Makes all `datafusion-physical-expr::simplifier` sub-modules (`not`
and `const_evaluator`) private, including their key functions. They are
not used externally, and being able to change their behavior seems more
valuable long term. The simplifier is also not currently an extension
point as far as I can tell, so there's no value in providing atomic
building blocks like them for now.
2. Removes `has_column_references` completely, its trivial to
re-implement and isn't used anywhere in the codebase.
---------
Co-authored-by: Andrew Lamb <[email protected]>
---
.../src/simplifier/const_evaluator.rs | 102 +++++++++++++++++----
datafusion/physical-expr/src/simplifier/mod.rs | 9 +-
datafusion/physical-expr/src/simplifier/not.rs | 4 +
3 files changed, 92 insertions(+), 23 deletions(-)
diff --git a/datafusion/physical-expr/src/simplifier/const_evaluator.rs
b/datafusion/physical-expr/src/simplifier/const_evaluator.rs
index 1e62e47ce2..1f3781c537 100644
--- a/datafusion/physical-expr/src/simplifier/const_evaluator.rs
+++ b/datafusion/physical-expr/src/simplifier/const_evaluator.rs
@@ -39,18 +39,80 @@ use crate::expressions::{Column, Literal};
/// - `1 + 2` -> `3`
/// - `(1 + 2) * 3` -> `9` (with bottom-up traversal)
/// - `'hello' || ' world'` -> `'hello world'`
+#[deprecated(
+ since = "53.0.0",
+ note = "This function will be removed in a future release in favor of a
private implementation that depends on other implementation details. Please
open an issue if you have a use case for keeping it."
+)]
pub fn simplify_const_expr(
expr: Arc<dyn PhysicalExpr>,
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
- simplify_const_expr_with_dummy(expr, &create_dummy_batch()?)
+ let batch = create_dummy_batch()?;
+ // If expr is already a const literal or can't be evaluated into one.
+ if expr.as_any().is::<Literal>() || (!can_evaluate_as_constant(&expr)) {
+ return Ok(Transformed::no(expr));
+ }
+
+ // Evaluate the expression
+ match expr.evaluate(&batch) {
+ Ok(ColumnarValue::Scalar(scalar)) => {
+ Ok(Transformed::yes(Arc::new(Literal::new(scalar))))
+ }
+ Ok(ColumnarValue::Array(arr)) if arr.len() == 1 => {
+ // Some operations return an array even for scalar inputs
+ let scalar = ScalarValue::try_from_array(&arr, 0)?;
+ Ok(Transformed::yes(Arc::new(Literal::new(scalar))))
+ }
+ Ok(_) => {
+ // Unexpected result - keep original expression
+ Ok(Transformed::no(expr))
+ }
+ Err(_) => {
+ // On error, keep original expression
+ // The expression might succeed at runtime due to short-circuit
evaluation
+ // or other runtime conditions
+ Ok(Transformed::no(expr))
+ }
+ }
}
-pub(crate) fn simplify_const_expr_with_dummy(
+/// Simplify expressions whose immediate children are all literals.
+///
+/// This function only checks the direct children of the expression,
+/// not the entire subtree. It is designed to be used with bottom-up tree
+/// traversal, where children are simplified before parents.
+///
+/// # Example transformations
+/// - `1 + 2` -> `3`
+/// - `(1 + 2) * 3` -> `9` (with bottom-up traversal, inner expr simplified
first)
+/// - `'hello' || ' world'` -> `'hello world'`
+pub(crate) fn simplify_const_expr_immediate(
expr: Arc<dyn PhysicalExpr>,
batch: &RecordBatch,
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
- // If expr is already a const literal or can't be evaluated into one.
- if expr.as_any().is::<Literal>() || (!can_evaluate_as_constant(&expr)) {
+ // Already a literal - nothing to do
+ if expr.as_any().is::<Literal>() {
+ return Ok(Transformed::no(expr));
+ }
+
+ // Column references cannot be evaluated at plan time
+ if expr.as_any().is::<Column>() {
+ return Ok(Transformed::no(expr));
+ }
+
+ // Volatile nodes cannot be evaluated at plan time
+ if expr.is_volatile_node() {
+ return Ok(Transformed::no(expr));
+ }
+
+ // Since transform visits bottom-up, children have already been simplified.
+ // If all children are now Literals, this node can be const-evaluated.
+ // This is O(k) where k = number of children, instead of O(subtree).
+ let all_children_literal = expr
+ .children()
+ .iter()
+ .all(|child| child.as_any().is::<Literal>());
+
+ if !all_children_literal {
return Ok(Transformed::no(expr));
}
@@ -77,6 +139,20 @@ pub(crate) fn simplify_const_expr_with_dummy(
}
}
+/// Create a 1-row dummy RecordBatch for evaluating constant expressions.
+///
+/// The batch is never actually accessed for data - it's just needed because
+/// the PhysicalExpr::evaluate API requires a RecordBatch. For expressions
+/// that only contain literals, the batch content is irrelevant.
+///
+/// This is the same approach used in the logical expression `ConstEvaluator`.
+pub(crate) fn create_dummy_batch() -> Result<RecordBatch> {
+ // RecordBatch requires at least one column
+ let dummy_schema = Arc::new(Schema::new(vec![Field::new("_",
DataType::Null, true)]));
+ let col = new_null_array(&DataType::Null, 1);
+ Ok(RecordBatch::try_new(dummy_schema, vec![col])?)
+}
+
fn can_evaluate_as_constant(expr: &Arc<dyn PhysicalExpr>) -> bool {
let mut can_evaluate = true;
@@ -93,21 +169,11 @@ fn can_evaluate_as_constant(expr: &Arc<dyn PhysicalExpr>)
-> bool {
can_evaluate
}
-/// Create a 1-row dummy RecordBatch for evaluating constant expressions.
-///
-/// The batch is never actually accessed for data - it's just needed because
-/// the PhysicalExpr::evaluate API requires a RecordBatch. For expressions
-/// that only contain literals, the batch content is irrelevant.
-///
-/// This is the same approach used in the logical expression `ConstEvaluator`.
-pub(crate) fn create_dummy_batch() -> Result<RecordBatch> {
- // RecordBatch requires at least one column
- let dummy_schema = Arc::new(Schema::new(vec![Field::new("_",
DataType::Null, true)]));
- let col = new_null_array(&DataType::Null, 1);
- Ok(RecordBatch::try_new(dummy_schema, vec![col])?)
-}
-
/// Check if this expression has any column references.
+#[deprecated(
+ since = "53.0.0",
+ note = "This function isn't used internally and is trivial to implement,
therefore it will be removed in a future release."
+)]
pub fn has_column_references(expr: &Arc<dyn PhysicalExpr>) -> bool {
let mut has_columns = false;
expr.apply(|expr| {
diff --git a/datafusion/physical-expr/src/simplifier/mod.rs
b/datafusion/physical-expr/src/simplifier/mod.rs
index 45ead82a0a..3f3f857344 100644
--- a/datafusion/physical-expr/src/simplifier/mod.rs
+++ b/datafusion/physical-expr/src/simplifier/mod.rs
@@ -24,9 +24,7 @@ use std::sync::Arc;
use crate::{
PhysicalExpr,
simplifier::{
- const_evaluator::{create_dummy_batch, simplify_const_expr_with_dummy},
- not::simplify_not_expr,
- unwrap_cast::unwrap_cast_in_comparison,
+ const_evaluator::create_dummy_batch,
unwrap_cast::unwrap_cast_in_comparison,
},
};
@@ -67,10 +65,11 @@ impl<'a> PhysicalExprSimplifier<'a> {
// Apply NOT expression simplification first, then unwrap cast
optimization,
// then constant expression evaluation
- let rewritten = simplify_not_expr(node, schema)?
+ #[expect(deprecated, reason = "`simplify_not_expr` is marked
as deprecated until it's made private.")]
+ let rewritten = not::simplify_not_expr(node, schema)?
.transform_data(|node| unwrap_cast_in_comparison(node,
schema))?
.transform_data(|node| {
- simplify_const_expr_with_dummy(node, &batch)
+ const_evaluator::simplify_const_expr_immediate(node,
&batch)
})?;
#[cfg(debug_assertions)]
diff --git a/datafusion/physical-expr/src/simplifier/not.rs
b/datafusion/physical-expr/src/simplifier/not.rs
index ea5467d0a4..709260aa48 100644
--- a/datafusion/physical-expr/src/simplifier/not.rs
+++ b/datafusion/physical-expr/src/simplifier/not.rs
@@ -43,6 +43,10 @@ use crate::expressions::{BinaryExpr, InListExpr, Literal,
NotExpr, in_list, lit}
/// This function applies a single simplification rule and returns. When used
with
/// TreeNodeRewriter, multiple passes will automatically be applied until no
more
/// transformations are possible.
+#[deprecated(
+ since = "53.0.0",
+ note = "This function will be made private in a future release, please
file an issue if you have a reason for keeping it public."
+)]
pub fn simplify_not_expr(
expr: Arc<dyn PhysicalExpr>,
schema: &Schema,
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]