[GitHub] [arrow] jorgecarleitao commented on pull request #8032: ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs

GitBox Mon, 24 Aug 2020 21:56:57 -0700


jorgecarleitao commented on pull request #8032:
URL: https://github.com/apache/arrow/pull/8032#issuecomment-679631810



   > For built-in functions like `sqrt` I would expect DataFusion to provide 
convenience functions to create an expression, like we do with `col` and the 
aggregate functions. I assume we could also do that with the design proposed 
here?
   > 
   > For example, I would like to be able to write:
   > 
   > ```rust
   > df.select(vec![col("foo"), sqrt(col("bar"))])?
   > ```
   
   This PR does not support this, as it threats every function (built-in or 
not) equally. To include that case, IMO this PR needs to add a new enum in the 
logical `Expr`:
   
   * `Expr::ScalarFunction { name: String/Enum, args: Vec<Expr> }` that we 
logically know its return type based on `name` (e.g. `"sqrt"`), exactly like 
`Expr::BinaryExpr`. This is mapped to a physical expression during planning. 
These can be build without access to the registry, as we hard-cod the return 
type on the logical plan to be consistent with the physical one, like we do for 
our aggregates, binary expressions, etc.
   * `Expr:ScalarUDF { fun: ScalarFunction, args: Vec<Expr> }`, whose return 
type is only known after going to the registry to check what the user set its 
return type to be (as this PR currently does).
   
   `sqrt` would return `Expr::Function`, that knows its own return type, and 
the planner converts it to `ScalarFunction` via a hard-coded map, while 
`Expr:UDF`'s physical planning is just planning `args` and pass them to 
`ScalarFunction` like this PR already does.
   
   I.e. at the physical level, built-in and UDFs are indistinguishable, but at 
the logical plan, one only knows its name (built-in), the other also knows its 
physical representation `ScalarUDF`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorgecarleitao commented on pull request #8032: ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs

Reply via email to