Adam-Alani commented on code in PR #22853:
URL: https://github.com/apache/datafusion/pull/22853#discussion_r3472690852
##########
datafusion/physical-expr/src/expressions/lambda.rs:
##########
@@ -149,6 +161,66 @@ impl LambdaExpr {
pub(crate) fn projected_body(&self) -> &Arc<dyn PhysicalExpr> {
&self.projected_body
}
+
+ /// Subset of [`params`](Self::params) (by name) that the body actually
+ /// references, taking nested-lambda shadowing into account. Used by the
+ /// higher-order function evaluator to skip evaluating/pushing parameters
+ /// the lambda body does not need, so that unused declared parameters do
+ /// not shift the merged batch's column positions out of sync with the
+ /// body's compressed indices.
+ pub fn used_params(&self) -> &HashSet<String> {
+ &self.used_params
+ }
+}
+
+/// Walk the body collecting:
+///
+/// * `used_indices` — every `Column` / `LambdaVariable` index referenced
+/// anywhere in the tree (including inside nested lambdas). This drives the
+/// `projection` used to slice the outer batch.
+/// * `used_param_names` — the subset of *this* lambda's `own_params` that the
+/// body actually references, with nested-lambda parameters shadowing the
+/// outer ones. For example, in `(k, v) -> func(col, (k, v2) -> k + v2 + v)`
+/// the inner `k` shadows the outer `k`, so only `v` flows up as used.
+fn collect_used(
+ node: &Arc<dyn PhysicalExpr>,
+ own_params: &HashSet<String>,
+ used_indices: &mut HashSet<usize>,
+ used_param_names: &mut HashSet<String>,
+ shadow_stack: &mut Vec<HashSet<String>>,
+) {
+ if let Some(col) = node.downcast_ref::<Column>() {
+ used_indices.insert(col.index());
+ } else if let Some(var) = node.downcast_ref::<LambdaVariable>() {
+ used_indices.insert(var.index());
+
+ let name = var.name();
+ let shadowed = shadow_stack.iter().any(|frame| frame.contains(name));
+ if !shadowed && own_params.contains(name) {
+ used_param_names.insert(name.to_string());
+ }
+ }
+
+ let pushed = if let Some(nested) = node.downcast_ref::<LambdaExpr>() {
+ shadow_stack.push(nested.params.iter().cloned().collect());
+ true
+ } else {
+ false
+ };
+
+ for child in node.children() {
+ collect_used(
+ child,
+ own_params,
+ used_indices,
+ used_param_names,
+ shadow_stack,
+ );
+ }
+
+ if pushed {
+ shadow_stack.pop();
+ }
}
Review Comment:
Refactored — replaced the hand-rolled recursion with a `TreeNodeVisitor`
(`CollectUsedVisitor`), with `f_down` doing the column-index/used-param
collection and pushing the shadow frame when entering a nested `LambdaExpr`,
and `f_up` popping it. Tests still pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]