Adam-Alani opened a new pull request, #22853: URL: https://github.com/apache/datafusion/pull/22853
## Which issue does this PR close? - Closes #. ## Rationale for this change Extracted from #22689 per maintainer request (the breaking change parts deserve their own focused review, separate from the additive UDF). \`LambdaExpr\` previously compressed the column-index projection by enumerating every referenced index and packing them into the front of the merged evaluation batch. That collapse is correct for outer captures (and for single-parameter lambdas, where it happens to be a no-op), but it also moves lambda parameters around. A two-parameter lambda like \`(k, v) -> v\` (where \`k\` is unused) would have its \`LambdaVariable\` for \`v\` re-projected from index 1 to index 0 — so at runtime it would read the slot the higher-order function had filled with \`k\` and silently return the wrong column. This is a latent bug today (no in-tree higher-order function exercises it), but it blocks #22689 (\`transform_values\`, which uses \`(k, v) -> body\` lambdas) and any future HOF that takes more than one parameter. ## What changes are included in this PR? - \`LambdaExpr\` now tracks an \`outer_columns_count: usize\` field equal to the size of the outer DFSchema the lambda was planned against. - Indices strictly below that boundary are treated as outer captures and compressed to the front of the projected batch (existing behavior). - Indices at or above that boundary are treated as lambda parameters and keep their fixed offset from the start of the parameter block — so an unused parameter leaves a gap rather than shifting later ones. - \`create_physical_expr\` for \`Expr::Lambda\` computes \`outer_columns_count\` as \`input_dfschema.fields().len().saturating_sub(params.len())\`, matching how the logical planner appends lambda parameter fields to the outer schema. This is a **breaking change** to two public APIs in \`datafusion-physical-expr\`: - \`LambdaExpr::try_new(params, body)\` → \`LambdaExpr::try_new(params, body, outer_columns_count)\` - \`expressions::lambda(params, body)\` → \`expressions::lambda(params, body, outer_columns_count)\` \`cargo-semver-checks\` will flag both. ## Are these changes tested? Yes, two new unit tests in \`datafusion/physical-expr/src/expressions/lambda.rs\`: - \`test_multi_param_lambda_keeps_param_positions_stable\` — a \`(k, v) -> v\` lambda with no outer captures keeps its \`LambdaVariable\` referencing index 1 instead of being collapsed to index 0. - \`test_lambda_compresses_outer_captures_but_pins_params\` — with 5 outer columns and a two-parameter lambda, an unused first parameter still leaves its slot in the projected batch, so the used second parameter ends up at \`used_captures + (orig - outer_columns_count)\` rather than being shifted forward. Existing \`test_lambda_evaluate\` / \`test_lambda_duplicate_name\` are updated for the new signature and continue to pass. \`cargo test -p datafusion-physical-expr\` and \`cargo clippy -p datafusion-physical-expr --all-features --all-targets -- -D warnings\` both succeed. ## Are there any user-facing changes? Yes — see the breaking change notes above. Adding the \`api change\` label. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
