linhr commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3173229180
As an update from the Sail project, we recently attracted great community interest to expand the coverage of Spark functions. Many of our community members already contributed a ton of enhancements and bug fixes related to Spark functions! One question I have, is how we should port Spark functions that are implemented using DataFusion logical expressions rather than `ScalarUDF`s. (There are many examples like [this](https://github.com/lakehq/sail/blob/8b833e19ecb6f4cfde4ef291661e3e0c2299276f/crates/sail-plan/src/function/scalar/string.rs#L106-L116) in the Sail repo.) In many such situations, there is a similar DataFusion function already and we only need proper type casting etc. to match the Spark semantic. I feel we need a good story how these can be ported. One way I can think of is to define a placeholder UDF and leverage `ScalarUDFImpl::simplify()` so that the UDF actually got rewritten, so that the placeholder UDF will not be part of the physical plan. I'm not sure if this is a hack though, so I'd love to see if there is a better way. cc @SparkApplicationMaster @davidlghellin @rafafrdz @anhvdq @jamesfricker who contributed to Sail so that you all are aware of this effort in DataFusion as well. I feel there is a chance for broader collaboration! cc @shehabgamin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org