zhengchenyu commented on PR #1660: URL: https://github.com/apache/incubator-uniffle/pull/1660#issuecomment-2072162406
@advancedxy Yes, for spark sql, it doesn't make sense for spark sql right now. > In fact, our cluster mainly use Hive on Tez right now. But we have plans to update spark. For Hive on Tez/MR, it make sense. We know hive also doesn't use the combine features of MR or TEZ. But why make sense? We know that the record from shuffle is sorted, we can combine in memory, then hive's aggregation operation is all in memory. In theory, the same can be done with spark-sql. SparkSQL can use sorted shuffle, then aggregate in memory. But we need to change a lot of spark-sql. Maybe we should focus on TEZ/MR firstly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
