Re: [PR] [WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files [celeborn]

via GitHub Sun, 07 Apr 2024 08:06:03 -0700


cfmcgrady commented on PR #2373:
URL: https://github.com/apache/celeborn/pull/2373#issuecomment-2041499262


   To reviewer.
   Just wanted to give you an update on my recent validation work on our 
internal cluster using split skewed partition with Celeborn Split level 
approach. I ran a job with the default Celeborn Split size of 1GB and spark 
advisoryPartitionSize of 64MB. However, I noticed that only 1/16 tasks were 
fetching the shuffle data to run, while the rest were empty.
   
   After discussing this with @waitinfuture , @wangshengjie123 , and @pan3793 , 
we decided to leverage chunks to split skewed partitions and gain more 
fine-grained data size sub-partitions. This was implemented in 
https://github.com/apache/celeborn/pull/2373/commits/dfeb731da692aeef1c513f5ac3837275146009f5
  and I tested it on my internal cluster with online tasks. The performance of 
the Shuffle Read stage was almost as good as ESS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [WIP][CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files [celeborn]

Reply via email to