[GitHub] [arrow-datafusion] alamb commented on issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions

via GitHub Sat, 25 Mar 2023 04:21:17 -0700


alamb commented on issue #5600:
URL: 
https://github.com/apache/arrow-datafusion/issues/5600#issuecomment-1483798905


   So my opinion on this matter is that ideally DataFusion should be an 
extensible engine and so people using it can pick whatever parts they want to 
use and replace what they don't with their own implementations.
   
   DataFusion includes a bunch of pre-built functionality (like the mostly 
compatible PG functions, parquet / json / etc readers, a memory catalog, etc) 
in order to get people started so they can focus on extending whatever is most 
important for their usecase.
   
   So I think it would be great to have a separate crate with "spark compatible 
functions" (maybe also the same could be done for  a "postgres compatible 
functions crate"). I think the BuiltInFunction thing is not required long term 
and it would be better if all functions could behave the same as user defined 
functions
   
   Then the question becomes "where is that crate's code stored"  -- it is 
probably fine initially to be in the main datafusion repo initially and if it 
gets too unweildy we could break it into its own repo or something.
   
   But the ability to customize the functions available I think is key
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions

Reply via email to