notfilippo commented on PR #22572: URL: https://github.com/apache/datafusion/pull/22572#issuecomment-4606695060
@vbarua, the goal of this PR is to **automatically** catalog UDFs defined via DataFusion's trait. It allows to expose the capabilities of the engine in the substrait extension format, both for the functions that the engine ships with by default and additionally for the functions you can define yourself (via the public API that this PR exposes). > For example, in your generated inventory you have a function definition for [abs](https://github.com/apache/datafusion/pull/22572/changes#diff-23b502d99c93a30effb9654219b8ede0f979b4f62394b628a43384eaef955a99R694-R767), but that is a function that is already defined in the upstream in [abs](https://github.com/substrait-io/substrait/blob/0dba93b2aeb66cef10576a25200f984945eb3dff/extensions/functions_arithmetic.yaml#L765-L771). In the API this PR exposes we have the concept of overrides (which also answers the concern @benbellick was raising). In this configuration we could introduce a system to map DataFusion UDF implementations to the golden definitions that substrait provides upstream (e.g. DataFusion's `abs` aligns 100% with substrait's `abs`, so we can skip it and just refer to the existing definition of `abs`) I think it's a good follow-up PR to this one since it requires work to make sure the semantics _really_ align with what substrait defines upstream (e.g. nullability could be different) Once we have mapped all functionality and deduplicated identical function we can continue this work and finally formalize DataFusion's substrait parser, which will gain the capability of validating functions definitions not just by name but also by catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
