Yicong-Huang commented on issue #3763: URL: https://github.com/apache/texera/issues/3763#issuecomment-3330981687
Down the road we could fuse UDFs together to reduce data transfer, especially under the use case of sort + visualization (python, plotly). For short term It can be useful for testing purposes such as benchmark for papers. And it looks good to have more operators in general. I see no big harm from keeping it? it is functional, after all. As for performance, Python sorting is generally slower than Scala sorting, but it’s unclear how much of the slowdown comes from cross-language data transfer versus the sorting itself. From Meng’s earlier inspection, it seems the sorting itself is not particularly slow. All in all, I vote for adding a new implementation of Scala sort and let users choose, instead of replacing the existing Python sort. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
