gstvg commented on issue #21231: URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4178732253
Since lambda variables are also evaluated via the batch, `CaseWhen` projection must see them as well. `MyCustomColumn` could return unbound columns so `CaseWhen` could see them, but others traversals that aren't lambda aware would see them too and would off course break. @rluvaton `children_in_scope/with_new_children_in_scope` does work, but I believe it would actually require new `TreeNode` methods, `apply_in_scope` for `CaseWhen`, but others should be added as well. And if the regular `children/with_new_children` don't report the `external_references`, existing traversals that look for `Column`s would need use them instead, and also I believe there would be no way to do a traversal that contains both the bound `Column`s as well lambda bodies, only one with columns but without lambdas (`_in_scope` versions) or another with lambdas but without columns (existing ones). When lambda support arrives with `CaseWhen` being aware of it, and @rluvaton can use it instead of it's custom implementation (which I believe will happen since he's helping me implement it on #18921), then this became a non issue, right? I don't think there will be any other case of custom column-like expressions besides regular columns and lambda variables. And then, If someone needs either a custom `Column` or `LambdaVariable` (but not something else), then it can return the core version as children as proposed by @alamb I have implemented lambda column capture support compatible with `CaseWhen` including the same optimization as `CaseWhen` at #21323 ([tests](https://github.com/apache/datafusion/pull/21323/changes/5c5ca195d1fc2c708a1e4964e8392337fc3b0b02#diff-82f03654507864698a026045144642b3dcda8aedc20d29aea782f28b47b9d0ea)), without requiring any major or public facing changes. Every `Column` present on lambda body exposed via `children` refers to correct index of the outermost, "root" schema. As a last resort, we can evaluate `LambdaVariable` without the batch by adding a `Option<ArrayRef>` to it that must be set before evaluation: ```rust struct LambdaVariable { name: String, field: FieldRef, value: Option<ArrayRef>, } ``` It would require a tree rewrite for every batch evaluation (internally handled by `HigherOrderFunctionExpr`) but would free others parts of the code to care about lambdas at all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
