milenkovicm commented on issue #19: URL: https://github.com/apache/arrow-datafusion-comet/issues/19#issuecomment-1945907961
I'd like to put a suggestion, based on my experience a lot of production spark workloads have some kind of UDF, good chunk of those UDFs are very simple and can be expressed as SQL expressions. I assume that in case UDF is used, comet will fall back to classic spark execution, which might not be optimal (I might be wrong, apologise if I am, I did not do my homework to check comet code in depth). My suggestion is to consider adding functionality like https://nvidia.github.io/spark-rapids/docs/additional-functionality/udf-to-catalyst-expressions.html which can speed up UDF in comet case as well. I believe there is nothing GPU specific in that code, and it can be reused, just not sure what would be the best approach. Maybe @andygrove would be able to help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
