Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

via GitHub Sun, 14 Dec 2025 11:00:59 -0800


gstvg commented on PR #18921:
URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3651816409


   Thanks @shehabgamin. I analyzed the Spark implementation one last time, and 
realized I could have done similarly since the beginning... I've already tested 
it locally and pretend to push soon, the diff reduced to ~2K LOC, mostly in 
new, self contained code with small changes in existing code and without 
requiring changes in downstream code. I hope you haven't put time reviewing it 
yet because I believe the spark approach is much better and also easier/faster 
to review
   
   Basically, for planning, we lazily set the Field in the lambda column 
itself, instead of injecting it into a DFSchema
   
   ```rust
   //logical
   struct LambdaColumn {
       name: String,
       field: Option<FieldRef>,
   }
   ```
   
   And for evaluation, again, we lazily set the value of a lambda column 
instead of injecting it into a RecordBatch
   
   ```rust
   //physical
   struct LambdaColumn {
       name: String,
       field: FieldRef,
       value: Option<ColumnarValue>,
   }
   ```
   
   I think I initially ruled that out because other expressions doesn't contain 
it's own Field, and this have been discussed before in #12604 *, and didn't 
went forward, which the difference that the other expressions Field's naturally 
exist in the plan schema, which *now* I believe justifies this difference.
   And most importantly, because I had no idea of how much work and downstream 
churn injecting the lambdas parameters into the Schema/RecordBatch would cause 
:facepalm: 
   
   I will push soon and update the PR description, thanks again!
   
   *This is being discussed again in #18845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

Reply via email to