linhr commented on PR #18921:
URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3678691204
> Could you expand this further? It is like `LambdaUDF` having a
`resolve_lambdas` method which result is passed to the `invoke_with_args`
method?
Suppose we have `array_transform([1, 2], v -> v*2)`. We could have a trait
`LambdaUDF` and have `impl LambdaUDF for ArrayTransform`. (Or we follow the
existing convention to have `struct LambdaUDF` and `trait LambdaUDFImpl`
separately.) A logical representation of `v -> v*2` is passed to
`ArrayTransform::new()`. For `Expr::LambdaFunction(LambdaFunction)`, we can
have `LambdaFunction { func: Arc<dyn LambdaUDF>, args: Vec<Expr> }` where the
non-lambda parameter `[1, 2]` is stored in `args`.
During physical planning, we could resolve `ArrayTransform` into
`PhysicalArrayTransform` which stores `v -> v*2` resolved as certain
`PhysicalExpr`. We have a trait `PhysicalLambdaUDF` and `impl PhysicalLambdaUDF
for PhysicalArrayTransform`. The trait method
`PhysicalLambdaUDF::invoke_with_args` accepts the Arrow array `[1, 2]` and
compute the results. I'd imagine this invocation can be done in a general
(physical) `LambdaFunctionExpr` that works for all lambda functions, similar to
`ScalarFunctionExpr`.
When I worked with `ScalarUDF`, I notice that the logic required for logical
representation, physical planning, and the actual execution are all within a
single `ScalarUDFImpl` trait. If we look at how these trait methods are used by
various planning/execution stages, we might get the big picture how a parallel
code structure (with multiple traits to separate the responsibilities) can be
designed for lambda functions.
I haven't thought about function registry, documentation etc. which we can
get for free in the existing `ScalarUDF` setup. So some more investigation is
needed to estimate the amount of work if we explore the route I described above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]