linhr commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3173229180

   As an update from the Sail project, we recently attracted great community 
interest to expand the coverage of Spark functions. Many of our community 
members already contributed a ton of enhancements and bug fixes related to 
Spark functions!
   
   One question I have, is how we should port Spark functions that are 
implemented using DataFusion logical expressions rather than `ScalarUDF`s. 
(There are many examples like 
[this](https://github.com/lakehq/sail/blob/8b833e19ecb6f4cfde4ef291661e3e0c2299276f/crates/sail-plan/src/function/scalar/string.rs#L106-L116)
 in the Sail repo.) In many such situations, there is a similar DataFusion 
function already and we only need proper type casting etc. to match the Spark 
semantic. I feel we need a good story how these can be ported.
   
   One way I can think of is to define a placeholder UDF and leverage 
`ScalarUDFImpl::simplify()` so that the UDF actually got rewritten, so that the 
placeholder UDF will not be part of the physical plan. I'm not sure if this is 
a hack though, so I'd love to see if there is a better way.
   
   cc @SparkApplicationMaster @davidlghellin @rafafrdz @anhvdq @jamesfricker 
who contributed to Sail so that you all are aware of this effort in DataFusion 
as well. I feel there is a chance for broader collaboration!
   cc @shehabgamin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to