2010YOUY01 commented on PR #7978:
URL:
https://github.com/apache/arrow-datafusion/pull/7978#issuecomment-1788433924
> Thank you @2010YOUY01 -- this is looking like a great first start.
>
> One thing that came to mind when I was reviewing this PR was what about
simply moving `BuiltInScalarFunction`s entirely into the `physical_expr` crate
(and remove `Expr::ScalarFunction`). We would have to extend ScalarUDF... but
that would have the nice property of keeping things uniform
Thank you for your review.
For the `impl ScalarFunctionDef` is split in two crate issue, here are my
thoughts:
### Motivation
I think it's better to `impl ScalarFunctionDef` at logical expression level,
to mimic all buitlin functions are defined outside the core, then they can go
through planning, logical plan optimization, and all the way to execution. This
way we can make sure the new interface is complete and replace all execution
code.
### `impl ScalarFunctionDef` fully in `phys-expr` crate
If we move `impl ScalarFunctionDef for BuiltinScalarFunction` into
`phys-expr` crate, then we can only replace the last half of execution code
(initial physical plan -> physical plan optimization -> execution) with the new
interface.
This way we have to keep two `Expr` nodes existing at the same time
```rust
// Expr node for scalar functions
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
pub fun: built_in_function::BuiltinScalarFunction,
pub args: Vec<Expr>,
}
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction2 {
pub fun: Arc<dyn ScalarFunctionDef>,
pub args: Vec<Expr>,
}
```
Then the execution would be
{Original BuiltinScalarFunctions} --(planning)--> `Expr::ScalarFunction` -->
`dyn ScalarFunctionDef` --> PhysicalPlanNode
{moved-out-function-packages, UDFs} --(planning)--> `Expr::ScalarFunction2`
--> `dyn ScalarFunctionDef` --> PhysicalPlanNode
This way we can only replace the function execution code for physical plan,
logical plan code can only be removed after all functions are moved to the new
interface. So the physical plan part of the code can be reused, but there will
be two separate code path for logical planning.
(But this PR showed the new interface is able to cover full functionalities,
if the above-mentioned issues are acceptable, we can go this way)
### `impl ScalarFunctionDef` split in two places
1. We can replace all SQL execution code at first
2. Only keep one `Expr` node
```rust
// Expr node for scalar functions
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct ScalarFunction {
pub fun: Arc<dyn ScalarFunctionDef>, // modified
pub args: Vec<Expr>,
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]