[GitHub] [arrow-datafusion] yjshen commented on issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions

via GitHub Tue, 14 Mar 2023 16:17:18 -0700


yjshen commented on issue #5600:
URL: 
https://github.com/apache/arrow-datafusion/issues/5600#issuecomment-1468994048


   Separating Spark functions into a special crate seems reasonable but 
supporting Spark UDFs requires significant effort. This is because many UDFs in 
Spark are designed to be compatible with Hive and handle corner cases 
differently than other databases like PG. These corner cases increase the 
workload of integrating Spark/Hive with DataFusion.
   
   When developing Blaze, we must compare the implementations of both engines 
or port tests first to ensure that they have identical semantics before passing 
a UDF for execution by DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions

Reply via email to