Adam-Alani opened a new pull request, #22853:
URL: https://github.com/apache/datafusion/pull/22853

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   Extracted from #22689 per maintainer request (the breaking change parts 
deserve their own focused review, separate from the additive UDF).
   
   \`LambdaExpr\` previously compressed the column-index projection by 
enumerating every referenced index and packing them into the front of the 
merged evaluation batch. That collapse is correct for outer captures (and for 
single-parameter lambdas, where it happens to be a no-op), but it also moves 
lambda parameters around. A two-parameter lambda like \`(k, v) -> v\` (where 
\`k\` is unused) would have its \`LambdaVariable\` for \`v\` re-projected from 
index 1 to index 0 — so at runtime it would read the slot the higher-order 
function had filled with \`k\` and silently return the wrong column.
   
   This is a latent bug today (no in-tree higher-order function exercises it), 
but it blocks #22689 (\`transform_values\`, which uses \`(k, v) -> body\` 
lambdas) and any future HOF that takes more than one parameter.
   
   ## What changes are included in this PR?
   
   - \`LambdaExpr\` now tracks an \`outer_columns_count: usize\` field equal to 
the size of the outer DFSchema the lambda was planned against.
   - Indices strictly below that boundary are treated as outer captures and 
compressed to the front of the projected batch (existing behavior).
   - Indices at or above that boundary are treated as lambda parameters and 
keep their fixed offset from the start of the parameter block — so an unused 
parameter leaves a gap rather than shifting later ones.
   - \`create_physical_expr\` for \`Expr::Lambda\` computes 
\`outer_columns_count\` as 
\`input_dfschema.fields().len().saturating_sub(params.len())\`, matching how 
the logical planner appends lambda parameter fields to the outer schema.
   
   This is a **breaking change** to two public APIs in 
\`datafusion-physical-expr\`:
   
   - \`LambdaExpr::try_new(params, body)\` → \`LambdaExpr::try_new(params, 
body, outer_columns_count)\`
   - \`expressions::lambda(params, body)\` → \`expressions::lambda(params, 
body, outer_columns_count)\`
   
   \`cargo-semver-checks\` will flag both.
   
   ## Are these changes tested?
   
   Yes, two new unit tests in 
\`datafusion/physical-expr/src/expressions/lambda.rs\`:
   
   - \`test_multi_param_lambda_keeps_param_positions_stable\` — a \`(k, v) -> 
v\` lambda with no outer captures keeps its \`LambdaVariable\` referencing 
index 1 instead of being collapsed to index 0.
   - \`test_lambda_compresses_outer_captures_but_pins_params\` — with 5 outer 
columns and a two-parameter lambda, an unused first parameter still leaves its 
slot in the projected batch, so the used second parameter ends up at 
\`used_captures + (orig - outer_columns_count)\` rather than being shifted 
forward.
   
   Existing \`test_lambda_evaluate\` / \`test_lambda_duplicate_name\` are 
updated for the new signature and continue to pass. \`cargo test -p 
datafusion-physical-expr\` and \`cargo clippy -p datafusion-physical-expr 
--all-features --all-targets -- -D warnings\` both succeed.
   
   ## Are there any user-facing changes?
   
   Yes — see the breaking change notes above. Adding the \`api change\` label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to