jorgecarleitao commented on pull request #7967:
URL: https://github.com/apache/arrow/pull/7967#issuecomment-678781518
The code you pointed to reads `return_type: DataType`. I will assume you
mean the return type declared in `Expr::ScalarFunctions`.
Two minds thinking alike: I was just trying to do that in the codebase.
Unfortunately, I do not think that that is sufficient 😞 : when a projection
is declared, we need to resolve its schema's type, which we do via
`Expr::get_type`. If we do not have the UDF's `return_type` on
`Expr::ScalarFunction`, we can't know its return type, which means we can't
even project (even before optimizations).
But to get the UDF's `DataType`, we need to access the UDF's registry. What
we currently do is let the user decide the `DataType` for us in the logical
plane via the call `scalar_function("name", vec![args..], DATATYPE)`.
Unfortunately, this means that the user needs to know the return type of the
UDF, or it will all break during planning, when the physical plan has nothing
to do with the logical one. I would prefer that the user does not have to have
this burden: it registers a UDF with the type, and then just plans a call
without its return type, during planning.
I am formalizing a proposal to address this. The gist is that we can't have
"meta" of UDFs in the logical plan: they need to know their return type, which
means that we need to access the registry during planning.
I am developing some ideas for this
[here](https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]