gstvg commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4178732253

   Since lambda variables are also evaluated via the batch, `CaseWhen` 
projection must see them as well. `MyCustomColumn` could return unbound columns 
so `CaseWhen` could see them, but others traversals that aren't lambda aware 
would see them too and would off course break.
   
   @rluvaton `children_in_scope/with_new_children_in_scope` does work, but I 
believe it would actually require new `TreeNode` methods, `apply_in_scope` for 
`CaseWhen`, but others should be added as well. And if the regular 
`children/with_new_children` don't report the `external_references`, existing 
traversals that look for `Column`s would need use them instead, and also I 
believe there would be no way to do a traversal that contains both the bound 
`Column`s as well lambda bodies, only one with columns but without lambdas 
(`_in_scope` versions) or another with lambdas but without columns (existing 
ones).
   
   When lambda support arrives with `CaseWhen` being aware of it, and @rluvaton 
can use it instead of it's custom implementation (which I believe will happen 
since he's helping me implement it on #18921), then this became a non issue, 
right? I don't think there will be any other case of custom column-like 
expressions besides regular columns and lambda variables. And then, If someone 
needs either a custom `Column` or `LambdaVariable` (but not something else), 
then it can return the core version as children as proposed by @alamb
   
   I have implemented lambda column capture support compatible with `CaseWhen` 
including the same optimization as `CaseWhen` at #21323 
([tests](https://github.com/apache/datafusion/pull/21323/changes/5c5ca195d1fc2c708a1e4964e8392337fc3b0b02#diff-82f03654507864698a026045144642b3dcda8aedc20d29aea782f28b47b9d0ea)),
 without requiring any major or public facing changes. Every `Column` present 
on lambda body exposed via `children` refers to correct index of the outermost, 
"root" schema.
   
   As a last resort, we can evaluate `LambdaVariable` without the batch by 
adding a `Option<ArrayRef>` to it that must be set before evaluation:
   
   ```rust
   struct LambdaVariable {
       name: String,
       field: FieldRef,
       value: Option<ArrayRef>,
   }
   ```
   
   It would require a tree rewrite for every batch evaluation (internally 
handled by `HigherOrderFunctionExpr`) but would free others parts of the code 
to care about lambdas at all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to