timsaucer opened a new pull request, #22340:
URL: https://github.com/apache/datafusion/pull/22340
## Which issue does this PR close?
- Closes #.
## Rationale for this change
`DefaultPhysicalPlanner::create_initial_plan` registers scalar-subquery
state for expression lowering by cloning the `SessionState` and mutating the
embedded `ExecutionProps`:
```rust
let mut owned = session_state.clone();
owned.execution_props_mut().subquery_indexes = index_map;
owned.execution_props_mut().subquery_results = results.clone();
```
The lowering side then reads those fields out of `ExecutionProps` in
`physical-expr/src/planner.rs` (`Expr::ScalarSubquery` branch). This is a
documented hack — the block comment at the write site says it should live in a
dedicated planning context, but `create_physical_expr` only takes
`&ExecutionProps`, so `ExecutionProps` was the cheapest carrier.
That side channel is also what blocks moving `QueryPlanner` /
`PhysicalPlanner` to `&dyn Session` (the goal of #20249 / #22151):
- `Session` trait has no `clone()` — trait objects can't be cloned without
extra machinery.
- `Session::execution_props(&self)` is immutable on purpose; `Session` is
`&self` from hot `TableProvider` paths.
So before any `&dyn Session` flip is even possible, this side channel needs
to go.
## What changes are included in this PR?
- Introduces `SubqueryContext` in `datafusion-expr/src/execution_props.rs` —
a small carrier bundling the `Subquery -> SubqueryIndex` map and the shared
`ScalarSubqueryResults`.
- Threads `&SubqueryContext` explicitly through:
- `DefaultPhysicalPlanner::create_initial_plan_inner`, `task_helper`,
`map_logical_node_to_physical`
- `create_grouping_physical_expr`, `merge_grouping_set_physical_expr`,
`create_cube_physical_expr`, `create_rollup_physical_expr`
- `get_null_physical_expr_pair`, `get_physical_expr_pair`,
`create_project_physical_exec_with_props`
- Adds `*_with_subquery_context` dual entry points alongside every existing
public expression-lowering function:
- `create_physical_expr_with_subquery_context` /
`create_physical_exprs_with_subquery_context` (physical-expr/planner.rs)
- `create_physical_sort_expr_with_subquery_context` /
`create_physical_sort_exprs_with_subquery_context`
(physical-expr/physical_expr.rs)
- `create_window_expr_with_subquery_context` /
`create_window_expr_with_name_and_subquery_context` (core/physical_planner.rs)
- `LoweredAggregateBuilder::with_subquery_context(...)`
(physical-expr/aggregate.rs)
- Drops the unreleased `subquery_indexes` and `subquery_results` fields from
`ExecutionProps`.
- Removes the `SessionState.clone()` + `execution_props_mut()` mutation at
the write site.
### Why dual entry points instead of changing the existing signatures?
The first instinct is to just add a 4th parameter to `create_physical_expr`:
```rust
pub fn create_physical_expr(
e: &Expr,
schema: &DFSchema,
props: &ExecutionProps,
subquery_ctx: &SubqueryContext, // new
) -> Result<Arc<dyn PhysicalExpr>>
```
Cleaner end state, but the cost is wrong:
- `create_physical_expr` has **~92 production callers** in this repo and is
also reached by downstream crates (datafusion-comet, ballista,
datafusion-python, sqllogictest, custom planners). All of them would have to
add the new argument.
- **None of those callers actually need scalar-subquery support.** Only one
read site cares: `physical-expr/src/planner.rs` `Expr::ScalarSubquery` branch.
That branch is only reached when the caller has registered uncorrelated
subqueries — which only the physical planner ever does.
- Forcing every caller to pass a context they don't use is API churn with no
payoff.
The dual-entry-point shape lets the existing signature keep working for
everyone:
```rust
pub fn create_physical_expr(
e: &Expr,
schema: &DFSchema,
props: &ExecutionProps,
) -> Result<Arc<dyn PhysicalExpr>> {
create_physical_expr_with_subquery_context(
e,
schema,
props,
&SubqueryContext::default(),
)
}
```
External callers who never construct a `SubqueryContext` get the same
behavior they had before: `Expr::ScalarSubquery` lowers to `not_impl_err`
(matching today's behavior when `subquery_indexes` was empty). The physical
planner — the only caller that needs subquery state — uses the
`_with_subquery_context` variant.
The trade-off is one extra public function per lowering entry point in
exchange for **zero external breakage**. With the `&dyn Session` refactor still
ahead of us and likely needing its own API moves, keeping this PR additive is
the right shape.
The unreleased status of `ExecutionProps.subquery_indexes` /
`subquery_results` (added in #21240, after tag 48.0.0-rc2) is what makes the
removal half free: they were never in a release, so there's nothing to
deprecate.
## Are these changes tested?
Existing test coverage exercises the new code path:
- `cargo test -p datafusion --lib` — 422 tests pass, including:
- `physical_planner::tests::scalar_subquery_in_sort_expr_plans`
- `physical_planner::tests::scalar_subquery_in_sort_expr_executes`
-
`physical_planner::tests::scalar_subquery_mixed_correlated_and_uncorrelated_executes`
- `cargo test -p datafusion-proto --test proto_integration` —
scalar-subquery roundtrip tests pass:
- `roundtrip_scalar_subquery_exec`
- `roundtrip_nested_scalar_subquery_exec_scopes_results`
- `roundtrip_scalar_subquery_exec_with_default_converter_executes`
- `cargo test -p datafusion-expr -p datafusion-physical-expr` — all pass.
- `cargo clippy --all-targets --all-features -- -D warnings` clean.
No new tests added: the SubqueryContext threading is a structural refactor
of existing behavior. All scalar-subquery semantics are unchanged and are
already covered by the tests above.
## Are there any user-facing changes?
**Public API additions** (additive, non-breaking):
- New type: `datafusion_expr::execution_props::SubqueryContext`.
- New functions in `datafusion-physical-expr`:
- `create_physical_expr_with_subquery_context`
- `create_physical_exprs_with_subquery_context`
- `create_physical_sort_expr_with_subquery_context`
- `create_physical_sort_exprs_with_subquery_context`
- New functions in `datafusion::physical_planner`:
- `create_window_expr_with_subquery_context`
- `create_window_expr_with_name_and_subquery_context`
- New builder method: `LoweredAggregateBuilder::with_subquery_context`.
**Removals** (unreleased — never shipped):
- `ExecutionProps::subquery_indexes` field.
- `ExecutionProps::subquery_results` field.
Both were added after tag 48.0.0-rc2 (in #21240) and have never been in a
published release, so there is no deprecation cycle required.
No behavior changes for any pre-existing public API. Every original function
preserves its signature and delegates with `SubqueryContext::default()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]