alamb opened a new issue, #9289: URL: https://github.com/apache/arrow-datafusion/issues/9289
### Is your feature request related to a problem or challenge? As part of porting `arrow_cast` https://github.com/apache/arrow-datafusion/issues/9287 and `make_array` https://github.com/apache/arrow-datafusion/issues/9288, it has become clear that some functions have special simplification semantics: 1. `arrow_cast` simplifies to `cast` of a particular data type. It is important for the rest of datafusion to understand the `Cast` semantics as the there are several special cases for Expr::Cast (such as [`unwrap_cast_in_comparison`](https://github.com/apache/arrow-datafusion/blob/ee9736fc81b4461d417b409baeaeb9f5e01ad962/datafusion/optimizer/src/unwrap_cast_in_comparison.rs#L40)) 2. `make_array` has special rewrite rules to combine / fold with `array_append` and `array_prepaend` in the [simplifier (source link)](https://github.com/apache/arrow-datafusion/blob/ee9736fc81b4461d417b409baeaeb9f5e01ad962/datafusion/optimizer/src/analyzer/rewrite_expr.rs#L135-L146) Also I think some systems may want to provide the ability to define UDFs that are more like macros (can be expressed in terms of other built in expressions), which needs some way for datafusion to "inline" the definition Similarly, specialized functions (e.g replace `regexp_match` with a version that had the regexp pre-compiled ...) - https://github.com/apache/arrow-datafusion/issues/8051 sound very similar ### Describe the solution you'd like I propose adding a function such as the following, we could implement the simplifications for `make_array` and `arrow_cast` in the UDF (and not in the main simplify code): ```rust /// Was the expression simplified? enum Simplified { /// The function call was simplified to an entirely new Expr Rewritten(Expr), /// the function call could not be simplified, and the arguments /// are return unmodified Original(Vec<Expr>) } ``` ```rust pub trait ScalarUDFImpl { ... /// Apply any function specific simplification rules /// /// Some functions like arrow_cast have special semantic simplifications /// (into `Expr::Cast` for example) that can improve planning. /// /// If there is a simpler representation of calling this function with the /// specified arguments, return `Simplified::Rewritten` with the simplification. /// If no such simplification was possible, returns `Simplified::Original` with /// the unmodified arguments (the default) /// /// This function should only apply simplifications specific to this function. /// DataFusion will automatically simplify the arguments with a variety /// of rewrites during optimization fn simplify(&self, args: Vec<Expr>) -> Result<Simplified> { Ok(Simplified::Original(args) } ... ``` } ### Describe alternatives you've considered There may be better ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
