We have a different solution to this problem. After testing, the performance of this solution in the shuffle write phase is basically the same as that of ESS. I will submit a PR to the community in the next few days, and hope to discuss it together.
Regards, Wang Xinyu At 2025-06-29 22:32:31, "rexxiong" <rexxi...@apache.org> wrote: >Thanks to Erik for the proposal. > >In fact, before Erik introduced this feature to the community, we had >already discussed this idea together, and Erik's team implemented it >internally. Later, we integrated this optimization into our production >environment, and I must say it has significantly improved performance in >skew scenarios. It not only enhances shuffle write efficiency notably but >also improves cluster resource utilization, preventing overload on a few >nodes. > >Additionally, there's a small issue to note: CIP-18 has already been used, >you can use CIP-20 for this. > > >Regards, >Jiashu Xiong > > >Erik fang <fme...@gmail.com> 于2025年6月27日周五 19:17写道: > >> Hi community, >> >> I'd like to start a discuss about CIP-18: Dynamically optimize shuffle >> write parallelism >> >> This proposal aims to enable Celeborn to write to multiple >> PartitionLocations for a single partition concurrently, which significantly >> improves skew partition performance >> >> Please let me know if you have any comments or questions >> >> link: >> >> https://docs.google.com/document/d/1CqFswIOP5nR8Cy2THo8tELOwVxUf0ZP_8pDjaYj2HGc/edit?usp=sharing >> >> Regards, >> Erik >>