Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

via GitHub Mon, 23 Feb 2026 03:33:35 -0800


rluvaton commented on PR #18921:
URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3944235156


   thanks a lot for this PR.
   
   Couple of things to make sure we support or have a way to add them in the 
future without breaking changes:
   1. Support `Map`, `(Large)List`, `(Large)ListView`, `FixedSizeList` as the 
input for lambda
   2. multiple lambdas in a single expression, for example 
`map_key_value(some_map_col, map_key_lambda, map_value_lambda)` and each lambda 
gets a different variables
   3. Lambda expression that access columns that are not in the list itself 
      so I can do the following:
       ```
       | year |    grades      |
       |------|----------------|
       | 1998 | [1, 2, 3]      |
       | 1999 | [4, 99, 5, 10] |
       | 2000 | [6, 0, null]   |
       ```
       `array_transform(grades, x -> if year <= 1990 then x * 10 else x)`
   4. optional arguments for lambda, for example the index of the item in the 
list
      the optional here is important as I want to avoid creating that input if 
I don't need to.
   5. Nested lambda expressions: `array_transform(matrix, x -> 
array_transform(x, y -> y * 2))`
   
   And some stuff that every lambda expression would need that we would need to 
provide a helpers (and not fix the input IMO as it would be expensive and the 
user might be able to have some prior knowledge on the input or just want their 
own implementation, or the child lambda expression can't error)
   > (I have helpers for all of these)
   1. how we handle null lists when the underlying list is not empty and the 
expression can fail, for example: `array_transform(list, x -> 1 / x)`
      for example this input: which the second list is `null` but the 
underlying value is `[0, 3]` which if we run the transform on it it will fail 
with division by zero.
       ```rust
       fn get_list() -> GenericListArray<i32> {
         GenericListArray::new(
           Arc::new(Field::new_list_field(DataType::Int8, false)),
           OffsetBuffer::<i32>::from_lengths(vec![2, 2, 1]),
           Arc::new(Int8Array::from(vec![1, 2, 0, 3, 4])),
           Some(NullBuffer::from(&[true, false, true])),
         )
       }
       ```
   
       I have a lot of helpers to cleanup the nulls BTW
   2. How we handle sliced lists the child should only work on the sliced data.
   
   
   -----
   
   I think having a new `LambdaUDFImpl` is better than adding functions on 
existing `ScalarUDF` because:
   1. The `ScalarUDF` trait will not grow too much and make implementing 
regular scalar UDFs easier or lambda overwhelming
   2. what if we need to add a required function but only for lambda, we can 
add it on the new trait with ease and we won't need to do some weird stuff
      to avoid breaking changes.
   3. Less ambiguity on the API.
   
   ----
   
   I want to keep the simplicity of `ScalarUDF` which means that in order to 
evaluate a lambda expression I don't need to construct stuff, only need to 
provide the input and maybe some options for future use.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [RFC] Add lambda support and array_transform udf [datafusion]

Reply via email to