findepi commented on code in PR #13717: URL: https://github.com/apache/datafusion/pull/13717#discussion_r1888341956
########## datafusion/core/src/catalog_common/information_schema.rs: ########## @@ -406,6 +406,7 @@ fn get_udf_args_and_return_types( .into_iter() .map(|arg_types| { // only handle the function which implemented [`ScalarUDFImpl::return_type`] method + #[allow(deprecated)] Review Comment: > ```rust > struct ReturnTypeArgs { > // Argments, if available. This may not be available in some contexts > // such as information_table schema queries > args: Option<&[Expr]> > // schema, if available. > schema: Option<&dyn ExprSchema>, > arg_types: &[DataType], > } > ``` the seemingly single method `udf.return_type_with_args` will be invokable with `ReturnTypeArgs` argument, which may or may not carry expressions. If the answer depends on the expression presence, it can depend in two ways 1. type is known when expressions are known (and are something particular), type is unknown otherwise 2. type is A when expressions are known (and are something particular), type is B (!= A) otherwise The second result is clearly not acceptable from planning perspective (https://github.com/apache/datafusion/pull/13717#discussion_r1879725758). Retrieving a false type B is misleading. Type B cannot be used if the truth is type A. Which leads us to the conclusion that the first option is the only acceptable option. For the first option, we realize that for some functions -- like for some syntaxes -- type is known when inputs are known. This is normal in query engines. This is normally handled by analyzer piece of a query engine, nothing special about it. This means that from query planning perspective, "what is this UDF return type" is a perfectly valid question with a perfectly well defined answer, that analyzer needs to know, should capture and persist. Once answered, there is no reason to ask this question ever again. The remaining use-case seems to be `information_schema.routines` perspective. There, the return type is unknown (or is a set of possible types), because the callsite is undefined. I am therefore not convinced the introduction of `return_type_with_args` is a good approach. It just hides the problem, without addressing it. It's better if the problem is clearly visible. IMO the way to address the problem is what outlined above -- analyzer needs to know the type, so it should capture and persist that information -- https://github.com/apache/datafusion/issues/12604. I believe we do not need to spend time on `return_type_with_args`, since it does not solve the crux of the problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org