jayzhan211 opened a new issue, #13825:
URL: https://github.com/apache/datafusion/issues/13825

   ### Is your feature request related to a problem or challenge?
   
   Continue on the discussion from 
https://github.com/apache/datafusion/pull/13756#discussion_r1887762971
   
   We create scalar function with the defined function `udf` and inputs `args` 
and keep them in `Expr`. We then coerce the inputs in optimizer step with 
`TypeCoercionRewriter` and compute the `nullability` when we create physical 
expr `create_physical_expr`.
   
   Both `return_type` and `nullability` are computed based on the inputs which 
are defined by `ScalarUDFImpl` in logical layer. As long as we have `udf` and 
`inputs: Vec<Expr>` we can decide whether it has a valid types and what 
implicit coercion we should do. We can define the `return_type` and 
`nullablity` based on those information.
   
   I think such properties can be decided as early as possible (when the 
function is created in logical layer).
   
   Once we have `return_type`, I think we can avoid call of 
`data_types_with_scalar_udf` again in `ExprSchemable::get_type`, 
`coerce_arguments_for_signature_with_scalar_udf` and `create_physical_expr`
   
   ### Describe the solution you'd like
   
   Compute the `return_type` and `nullable` as early as possible.
   
   ```rust
   #[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
   pub struct ScalarFunction {
       /// The function
       pub func: Arc<crate::ScalarUDF>,
       /// List of expressions to feed to the functions as arguments
       pub args: Vec<Expr>,
       
       // New fields for ScalarFunction
       pub return_type: Vec<DataType>,
       pub nullable: bool,
   }
   ```
   
   `ScalarFunction::new_udf` is used to create function, need to double check 
if there is other way to create scalar function. We need to make sure they all 
go to the same computation.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   We create physical expr with logical input and schema which doesn't seem 
correct too.
   
   ```rust
   /// Create a physical expression for the UDF.
   pub fn create_physical_expr(
       fun: &ScalarUDF,
       input_phy_exprs: &[Arc<dyn PhysicalExpr>],
       input_schema: &Schema,
       args: &[Expr],
       input_dfschema: &DFSchema,
   ) -> Result<Arc<dyn PhysicalExpr>> {
       let input_expr_types = input_phy_exprs
           .iter()
           .map(|e| e.data_type(input_schema))
           .collect::<Result<Vec<_>>>()?;
   
       // verify that input data types is consistent with function's 
`TypeSignature`
       data_types_with_scalar_udf(&input_expr_types, fun)?;
   
       // Since we have arg_types, we dont need args and schema.
       let return_type =
           fun.return_type_from_exprs(args, input_dfschema, &input_expr_types)?;
   
       Ok(Arc::new(
           ScalarFunctionExpr::new(
               fun.name(),
               Arc::new(fun.clone()),
               input_phy_exprs.to_vec(),
               return_type,
           )
           .with_nullable(fun.is_nullable(args, input_dfschema)),
       ))
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to