notfilippo commented on PR #22572:
URL: https://github.com/apache/datafusion/pull/22572#issuecomment-4606695060

   @vbarua, the goal of this PR is to **automatically** catalog UDFs defined 
via DataFusion's trait. It allows to expose the capabilities of the engine in 
the substrait extension format, both for the functions that the engine ships 
with by default and additionally for the functions you can define yourself (via 
the public API that this PR exposes).
   
   > For example, in your generated inventory you have a function definition 
for 
[abs](https://github.com/apache/datafusion/pull/22572/changes#diff-23b502d99c93a30effb9654219b8ede0f979b4f62394b628a43384eaef955a99R694-R767),
 but that is a function that is already defined in the upstream in 
[abs](https://github.com/substrait-io/substrait/blob/0dba93b2aeb66cef10576a25200f984945eb3dff/extensions/functions_arithmetic.yaml#L765-L771).
   
   In the API this PR exposes we have the concept of overrides (which also 
answers the concern @benbellick was raising). In this configuration we could 
introduce a system to map DataFusion UDF implementations to the golden 
definitions that substrait provides upstream (e.g. DataFusion's `abs` aligns 
100% with substrait's `abs`, so we can skip it and just refer to the existing 
definition of `abs`)
   
   I think it's a good follow-up PR to this one since it requires work to make 
sure the semantics _really_ align with what substrait defines upstream (e.g. 
nullability could be different)
   
   Once we have mapped all functionality and deduplicated identical function we 
can continue this work and finally formalize DataFusion's substrait parser, 
which will gain the capability of validating functions definitions not just by 
name but also by catalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to