timsaucer opened a new pull request, #22340:
URL: https://github.com/apache/datafusion/pull/22340

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   `DefaultPhysicalPlanner::create_initial_plan` registers scalar-subquery 
state for expression lowering by cloning the `SessionState` and mutating the 
embedded `ExecutionProps`:
   
   ```rust
   let mut owned = session_state.clone();
   owned.execution_props_mut().subquery_indexes = index_map;
   owned.execution_props_mut().subquery_results = results.clone();
   ```
   
   The lowering side then reads those fields out of `ExecutionProps` in 
`physical-expr/src/planner.rs` (`Expr::ScalarSubquery` branch). This is a 
documented hack — the block comment at the write site says it should live in a 
dedicated planning context, but `create_physical_expr` only takes 
`&ExecutionProps`, so `ExecutionProps` was the cheapest carrier.
   
   That side channel is also what blocks moving `QueryPlanner` / 
`PhysicalPlanner` to `&dyn Session` (the goal of #20249 / #22151):
   
   - `Session` trait has no `clone()` — trait objects can't be cloned without 
extra machinery.
   - `Session::execution_props(&self)` is immutable on purpose; `Session` is 
`&self` from hot `TableProvider` paths.
   
   So before any `&dyn Session` flip is even possible, this side channel needs 
to go.
   
   ## What changes are included in this PR?
   
   - Introduces `SubqueryContext` in `datafusion-expr/src/execution_props.rs` — 
a small carrier bundling the `Subquery -> SubqueryIndex` map and the shared 
`ScalarSubqueryResults`.
   - Threads `&SubqueryContext` explicitly through:
     - `DefaultPhysicalPlanner::create_initial_plan_inner`, `task_helper`, 
`map_logical_node_to_physical`
     - `create_grouping_physical_expr`, `merge_grouping_set_physical_expr`, 
`create_cube_physical_expr`, `create_rollup_physical_expr`
     - `get_null_physical_expr_pair`, `get_physical_expr_pair`, 
`create_project_physical_exec_with_props`
   - Adds `*_with_subquery_context` dual entry points alongside every existing 
public expression-lowering function:
     - `create_physical_expr_with_subquery_context` / 
`create_physical_exprs_with_subquery_context` (physical-expr/planner.rs)
     - `create_physical_sort_expr_with_subquery_context` / 
`create_physical_sort_exprs_with_subquery_context` 
(physical-expr/physical_expr.rs)
     - `create_window_expr_with_subquery_context` / 
`create_window_expr_with_name_and_subquery_context` (core/physical_planner.rs)
     - `LoweredAggregateBuilder::with_subquery_context(...)` 
(physical-expr/aggregate.rs)
   - Drops the unreleased `subquery_indexes` and `subquery_results` fields from 
`ExecutionProps`.
   - Removes the `SessionState.clone()` + `execution_props_mut()` mutation at 
the write site.
   
   ### Why dual entry points instead of changing the existing signatures?
   
   The first instinct is to just add a 4th parameter to `create_physical_expr`:
   
   ```rust
   pub fn create_physical_expr(
       e: &Expr,
       schema: &DFSchema,
       props: &ExecutionProps,
       subquery_ctx: &SubqueryContext, // new
   ) -> Result<Arc<dyn PhysicalExpr>>
   ```
   
   Cleaner end state, but the cost is wrong:
   
   - `create_physical_expr` has **~92 production callers** in this repo and is 
also reached by downstream crates (datafusion-comet, ballista, 
datafusion-python, sqllogictest, custom planners). All of them would have to 
add the new argument.
   - **None of those callers actually need scalar-subquery support.** Only one 
read site cares: `physical-expr/src/planner.rs` `Expr::ScalarSubquery` branch. 
That branch is only reached when the caller has registered uncorrelated 
subqueries — which only the physical planner ever does.
   - Forcing every caller to pass a context they don't use is API churn with no 
payoff.
   
   The dual-entry-point shape lets the existing signature keep working for 
everyone:
   
   ```rust
   pub fn create_physical_expr(
       e: &Expr,
       schema: &DFSchema,
       props: &ExecutionProps,
   ) -> Result<Arc<dyn PhysicalExpr>> {
       create_physical_expr_with_subquery_context(
           e,
           schema,
           props,
           &SubqueryContext::default(),
       )
   }
   ```
   
   External callers who never construct a `SubqueryContext` get the same 
behavior they had before: `Expr::ScalarSubquery` lowers to `not_impl_err` 
(matching today's behavior when `subquery_indexes` was empty). The physical 
planner — the only caller that needs subquery state — uses the 
`_with_subquery_context` variant.
   
   The trade-off is one extra public function per lowering entry point in 
exchange for **zero external breakage**. With the `&dyn Session` refactor still 
ahead of us and likely needing its own API moves, keeping this PR additive is 
the right shape.
   
   The unreleased status of `ExecutionProps.subquery_indexes` / 
`subquery_results` (added in #21240, after tag 48.0.0-rc2) is what makes the 
removal half free: they were never in a release, so there's nothing to 
deprecate.
   
   ## Are these changes tested?
   
   Existing test coverage exercises the new code path:
   
   - `cargo test -p datafusion --lib` — 422 tests pass, including:
     - `physical_planner::tests::scalar_subquery_in_sort_expr_plans`
     - `physical_planner::tests::scalar_subquery_in_sort_expr_executes`
     - 
`physical_planner::tests::scalar_subquery_mixed_correlated_and_uncorrelated_executes`
   - `cargo test -p datafusion-proto --test proto_integration` — 
scalar-subquery roundtrip tests pass:
     - `roundtrip_scalar_subquery_exec`
     - `roundtrip_nested_scalar_subquery_exec_scopes_results`
     - `roundtrip_scalar_subquery_exec_with_default_converter_executes`
   - `cargo test -p datafusion-expr -p datafusion-physical-expr` — all pass.
   - `cargo clippy --all-targets --all-features -- -D warnings` clean.
   
   No new tests added: the SubqueryContext threading is a structural refactor 
of existing behavior. All scalar-subquery semantics are unchanged and are 
already covered by the tests above.
   
   ## Are there any user-facing changes?
   
   **Public API additions** (additive, non-breaking):
   
   - New type: `datafusion_expr::execution_props::SubqueryContext`.
   - New functions in `datafusion-physical-expr`:
     - `create_physical_expr_with_subquery_context`
     - `create_physical_exprs_with_subquery_context`
     - `create_physical_sort_expr_with_subquery_context`
     - `create_physical_sort_exprs_with_subquery_context`
   - New functions in `datafusion::physical_planner`:
     - `create_window_expr_with_subquery_context`
     - `create_window_expr_with_name_and_subquery_context`
   - New builder method: `LoweredAggregateBuilder::with_subquery_context`.
   
   **Removals** (unreleased — never shipped):
   
   - `ExecutionProps::subquery_indexes` field.
   - `ExecutionProps::subquery_results` field.
   
   Both were added after tag 48.0.0-rc2 (in #21240) and have never been in a 
published release, so there is no deprecation cycle required.
   
   No behavior changes for any pre-existing public API. Every original function 
preserves its signature and delegates with `SubqueryContext::default()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to