findepi commented on code in PR #13717:
URL: https://github.com/apache/datafusion/pull/13717#discussion_r1888341956
##########
datafusion/core/src/catalog_common/information_schema.rs:
##########
@@ -406,6 +406,7 @@ fn get_udf_args_and_return_types(
.into_iter()
.map(|arg_types| {
// only handle the function which implemented
[`ScalarUDFImpl::return_type`] method
+ #[allow(deprecated)]
Review Comment:
> ```rust
> struct ReturnTypeArgs {
> // Argments, if available. This may not be available in some
contexts
> // such as information_table schema queries
> args: Option<&[Expr]>
> // schema, if available.
> schema: Option<&dyn ExprSchema>,
> arg_types: &[DataType],
> }
> ```
the seemingly single method `udf.return_type_with_args` will be invokable
with `ReturnTypeArgs` argument, which may or may not carry expressions. If the
answer depends on the expression presence, it can depend in two ways
1. type is known when expressions are known (and are something particular),
type is unknown otherwise
2. type is A when expressions are known (and are something particular), type
is B (!= A) otherwise
The second result is clearly not acceptable from planning perspective
(https://github.com/apache/datafusion/pull/13717#discussion_r1879725758).
Retrieving a false type B is misleading. Type B cannot be used if the truth is
type A. Which leads us to the conclusion that the first option is the only
acceptable option.
For the first option, we realize that for some functions -- like for some
syntaxes -- type is known when inputs are known. This is normal in query
engines. This is normally handled by analyzer piece of a query engine, nothing
special about it. This means that from query planning perspective, "what is
this UDF return type" is a perfectly valid question with a perfectly well
defined answer, that analyzer needs to know, should capture and persist. Once
answered, there is no reason to ask this question ever again.
The remaining use-case seems to be `information_schema.routines`
perspective. There, the return type is unknown (or is a set of possible types),
because the callsite is undefined.
I am therefore not convinced the introduction of `return_type_with_args` is
a good approach. It just hides the problem, without addressing it. It's better
if the problem is clearly visible. IMO the way to address the problem is what
outlined above -- analyzer needs to know the type, so it should capture and
persist that information -- https://github.com/apache/datafusion/issues/12604.
I believe we do not need to spend time on `return_type_with_args`, since it
does not solve the crux of the problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]