cfmcgrady commented on PR #2373: URL: https://github.com/apache/celeborn/pull/2373#issuecomment-2041499262
To reviewer. Just wanted to give you an update on my recent validation work on our internal cluster using split skewed partition with Celeborn Split level approach. I ran a job with the default Celeborn Split size of 1GB and spark advisoryPartitionSize of 64MB. However, I noticed that only 1/16 tasks were fetching the shuffle data to run, while the rest were empty. After discussing this with @waitinfuture , @wangshengjie123 , and @pan3793 , we decided to leverage chunks to split skewed partitions and gain more fine-grained data size sub-partitions. This was implemented in https://github.com/apache/celeborn/pull/2373/commits/dfeb731da692aeef1c513f5ac3837275146009f5 and I tested it on my internal cluster with online tasks. The performance of the Shuffle Read stage was almost as good as ESS. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
