comphead commented on PR #18921:
URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3946052171

   Thanks @rluvaton and @gstvg , its nice you mentioned `array_transform`, the 
tricky part for this function is its return type depends on lambda
   
   ```
   array_transform(array<T>, function<T, U>) -> array<U>
   ```
   
   > I want to keep the simplicity of ScalarUDF which means that in order to 
evaluate a lambda expression I don't need to construct stuff, only need to 
provide the input and maybe some options for future use.
   
   Right, on high level it could be like
   
   ```
   pub struct LambdaExpr {
       /// Parameter names/types already resolved
       pub param_types: Vec<DataType>,
   
       /// Expression body, what needs to be evaluated, this thing potentially 
can be UDF
       pub body: Arc<dyn PhysicalExpr>,
   }
   ```
   
   Impl
   
   ```
   impl LambdaExpr {
       pub fn new(
           param_types: Vec<DataType>,
           body: Arc<dyn PhysicalExpr>,
       ) -> Self {
           Self { param_types, body }
       }
   
       /// Evaluate lambda over provided arrays
       pub fn evaluate_with_args(
           &self,
           args: Vec<ArrayRef>,
       ) -> Result<ArrayRef> {
           // Build synthetic schema
           let fields: Vec<Field> = self.param_types
               .iter()
               .enumerate()
               .map(|(i, dt)| Field::new(format!("arg{}", i), dt.clone(), true))
               .collect();
   
           let schema = Arc::new(Schema::new(fields));
   
           let batch = RecordBatch::try_new(schema, args)?;
   
           self.body.evaluate(&batch)   // this where our UDF would be called
       }
   }
   ```
   
   So for example `x -> x + 1` we need to parse expression and create our 
Lambda, so we need to modify parser to get structures below from user defined 
code and there is an existing ticket  
https://github.com/apache/datafusion-sqlparser-rs/issues/1273
   
   ```
   // Parameter x at column 0
   let x = Arc::new(ColumnExpr::new(0));
   
   // Literal 1
   let one = Arc::new(LiteralExpr::new(
       ScalarValue::Int32(Some(1))
   ));
   
   // x + 1
   let body = Arc::new(BinaryExpr::new(
       x,
       one,
       Operator::Add,
   ));
   
   // Lambda(x) -> x + 1
   let lambda = LambdaExpr::new(
       vec![DataType::Int32],
       body,
   );
   ```
   
   and call it from caller built in function 
   
   ```
   fn array_transform(
       list_array: &ListArray,
       lambda: &LambdaExpr,
   ) -> Result<ListArray> {
   
       let values = list_array.values().clone();
   
       // evaluate lambda on flattened child array
       let transformed =
           lambda.evaluate_with_args(vec![values])?;
   
       Ok(ListArray::new(
           list_array.data_type().clone(),
           list_array.offsets().clone(),
           transformed,
           list_array.nulls().cloned(),
       ))
   }
   ```
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to