jorgecarleitao commented on pull request #8032:
URL: https://github.com/apache/arrow/pull/8032#issuecomment-679631810
> For built-in functions like `sqrt` I would expect DataFusion to provide
convenience functions to create an expression, like we do with `col` and the
aggregate functions. I assume we could also do that with the design proposed
here?
>
> For example, I would like to be able to write:
>
> ```rust
> df.select(vec![col("foo"), sqrt(col("bar"))])?
> ```
This PR does not support this, as it threats every function (built-in or
not) equally. To include that case, IMO this PR needs to add a new enum in the
logical `Expr`:
* `Expr::ScalarFunction { name: String/Enum, args: Vec<Expr> }` that we
logically know its return type based on `name` (e.g. `"sqrt"`), exactly like
`Expr::BinaryExpr`. This is mapped to a physical expression during planning.
These can be build without access to the registry, as we hard-cod the return
type on the logical plan to be consistent with the physical one, like we do for
our aggregates, binary expressions, etc.
* `Expr:ScalarUDF { fun: ScalarFunction, args: Vec<Expr> }`, whose return
type is only known after going to the registry to check what the user set its
return type to be (as this PR currently does).
`sqrt` would return `Expr::Function`, that knows its own return type, and
the planner converts it to `ScalarFunction` via a hard-coded map, while
`Expr:UDF`'s physical planning is just planning `args` and pass them to
`ScalarFunction` like this PR already does.
I.e. at the physical level, built-in and UDFs are indistinguishable, but at
the logical plan, one only knows its name (built-in), the other also knows its
physical representation `ScalarUDF`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]