alamb commented on issue #5600: URL: https://github.com/apache/arrow-datafusion/issues/5600#issuecomment-1483798905
So my opinion on this matter is that ideally DataFusion should be an extensible engine and so people using it can pick whatever parts they want to use and replace what they don't with their own implementations. DataFusion includes a bunch of pre-built functionality (like the mostly compatible PG functions, parquet / json / etc readers, a memory catalog, etc) in order to get people started so they can focus on extending whatever is most important for their usecase. So I think it would be great to have a separate crate with "spark compatible functions" (maybe also the same could be done for a "postgres compatible functions crate"). I think the BuiltInFunction thing is not required long term and it would be better if all functions could behave the same as user defined functions Then the question becomes "where is that crate's code stored" -- it is probably fine initially to be in the main datafusion repo initially and if it gets too unweildy we could break it into its own repo or something. But the ability to customize the functions available I think is key -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
