Re: [PR] [#1239] Remote merge on the shuffle server side. [incubator-uniffle]

via GitHub Tue, 23 Apr 2024 05:23:47 -0700


zhengchenyu commented on PR #1660:
URL: 
https://github.com/apache/incubator-uniffle/pull/1660#issuecomment-2072162406


   @advancedxy 
   Yes, for spark sql, it doesn't make sense for spark sql right now.  
   
   > In fact, our cluster mainly use Hive on Tez right now. But we have plans 
to update spark.
   
   For Hive on Tez/MR, it make sense. We know hive also doesn't use the combine 
features of MR or TEZ. But why make sense?
   We know that the record from shuffle is sorted,  we can combine in memory, 
then hive's aggregation operation is all in memory. In theory, the same can be 
done with spark-sql. SparkSQL can use sorted shuffle, then aggregate in memory. 
But we need to change a lot of spark-sql. Maybe we should focus on TEZ/MR 
firstly.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1239] Remote merge on the shuffle server side. [incubator-uniffle]

Reply via email to