gstvg commented on code in PR #18921:
URL: https://github.com/apache/datafusion/pull/18921#discussion_r2898017709
##########
datafusion/substrait/src/logical_plan/producer/expr/mod.rs:
##########
@@ -152,6 +152,9 @@ pub fn to_substrait_rex(
not_impl_err!("Cannot convert {expr:?} to Substrait")
}
Expr::Unnest(expr) => not_impl_err!("Cannot convert {expr:?} to
Substrait"),
+ Expr::LambdaFunction(expr) => producer.handle_lambda_function(expr,
schema),
+ Expr::Lambda(expr) => not_impl_err!("Cannot convert {expr:?} to
Substrait"), // not yet implemented in substrait-rs
+ Expr::LambdaVariable(expr) => not_impl_err!("Cannot convert {expr:?}
to Substrait"), // not yet implemented in substrait-rs
Review Comment:
Thanks @LiaCastaneda for the info and @benbellick for the release
> I wonder if the expressions can actually be mapped 1:1
I did a quick check and I believe it's possible, yes.
`LambdaFunction` produces a regular substrait `ScalarFunction`, and to
consume we just check whether any `LambdaUDF` exists with the given function
name and/or if any arg is a lambda.
To produce a substrait
[Lambda](https://substrait.io/expressions/lambda_expressions/#lambda-expression-structure)
with the field `parameters` struct where each field corresponds to a
parameter, we use the return of `LambdaUDF::lambdas_parameters`, and for
consuming, we can use default parameters names like `p1`, `p2` etc
To produce a
[FieldReference](https://substrait.io/expressions/field_references/)/[LambdaParameterReference](https://substrait.io/expressions/lambda_expressions/#parameter-references)
with a correct `steps_out` and `direct_reference.struct_field.field`, we keep
a `HashMap<String, (usize, usize)>` of `lambda_parameter_name => (step_outs,
struct_field_index)`, that get's updated for every `Lambda` to be produced:
`step_outs` is reset to 0 for shadowed parameters, others incremented by one,
and `struct_field_index` set according to it's position within the parameters
of the lambda it originates from. Consuming into `LambdaVariable` with it's
field resolved is similar to SQL planning: keep a `HashMap<usize, FieldRef>` of
`direct_reference.struct_field.field => field` that get's updated for every
`LambdaUDF` by the return of it's `lambdas_parameters` method
I'll address your another comments and then implement substrait support on
another branch to make sure is viable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]