Thanks Yuxin for driving this proposal! I have a question about the public interface compatibility in the context of FLIP-459. As we've supported batch job recovery from jobMaster failures in FLIP-383 which will be released in Flink 1.20. I wonder if the current interface design support the future adaptation for batch job recovery?
Looking forward to your feedback. Best, Junrui. weijie guo <guoweijieres...@gmail.com> 于2024年6月5日周三 10:13写道: > Thanks Yuxin for the proposal! > > When we first proposed Hybrid Shuffle, I wanted to support pluggable > storage tier in the future. However, limited by the architecture of the > legacy Hybrid Shuffle at that time, this idea has not been realized. The > new architecture abstracts the tier nicely, and now it's time to introduce > support for external storage. > > Big +1 for this one! > > Best regards, > > Weijie > > > rexxiong <rexxi...@apache.org> 于2024年6月5日周三 00:08写道: > > > Thanks Yuxin for the proposal. +1, as a member of the Apache Celeborn > > community, I am very excited about the integration of Flink's Hybrid > > Shuffle with Apache Celeborn. The whole design of CIP-6 looks good to > me. I > > am looking forward to this integration. > > > > Thanks, > > Jiashu Xiong > > > > Ethan Feng <ethanf...@apache.org> 于2024年6月4日周二 16:47写道: > > > > > +1 for this proposal. > > > > > > After internally reviewing the prototype of CIP-6, this would improve > > > performance and stability for Flink users using Celeborn. > > > > > > Expect to see this feature come out to the community. > > > > > > As I come from the Celeborn community, I hope more users can try to > > > use Celeborn when there are Flink batch jobs. > > > > > > Thanks, > > > Ethan Feng > > > > > > Yuxin Tan <tanyuxinw...@gmail.com> 于2024年6月4日周二 16:34写道: > > > > > > > > Hi, Venkatakrishnan, > > > > > > > > Thanks for joining the discussion. We appreciate your interest > > > > in contributing to the work. Once the FLIP and CIP proposals > > > > have been approved, we will create some JIRA tickets in Flink > > > > and Celeborn projects. Please feel free to take a look at the > > > > tickets and select any that resonate with your interests. > > > > > > > > Best, > > > > Yuxin > > > > > > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年5月31日周五 23:11写道: > > > > > > > > > Thanks for this FLIP. We are also interested in > learning/contributing > > > to > > > > > the hybrid shuffle integration with celeborn for batch executions. > > > > > > > > > > On Tue, May 28, 2024, 7:07 PM Yuxin Tan <tanyuxinw...@gmail.com> > > > wrote: > > > > > > > > > > > Hi, Xintong, > > > > > > > > > > > > > I think we can also publish the prototype codes so the > > > > > > community can better understand and help with it. > > > > > > > > > > > > Ok, I agree on the point. I will prepare and publish the code > > > > > > recently. > > > > > > > > > > > > Rui, > > > > > > > > > > > > > Kindly reminder: the image of CIP-6[1] cannot be loaded. > > > > > > > > > > > > Thanks for the reminder. I've updated the images. > > > > > > > > > > > > > > > > > > Best, > > > > > > Yuxin > > > > > > > > > > > > > > > > > > Rui Fan <1996fan...@gmail.com> 于2024年5月29日周三 09:33写道: > > > > > > > > > > > > > Thanks Yuxin for driving this proposal! > > > > > > > > > > > > > > Kindly reminder: the image of CIP-6[1] cannot be loaded. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;KysrKysrKys!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9OJ6v9uGw$ > > > > > > > > > > > > > > Best, > > > > > > > Rui > > > > > > > > > > > > > > On Wed, May 29, 2024 at 9:03 AM Xintong Song < > > > tonysong...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > +1 for this proposal. > > > > > > > > > > > > > > > > We have been prototyping this feature internally at Alibaba > > for a > > > > > > couple > > > > > > > of > > > > > > > > months. Yuxin, I think we can also publish the prototype > codes > > > so the > > > > > > > > community can better understand and help with it. > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > Xintong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, May 28, 2024 at 8:34 PM Yuxin Tan < > > > tanyuxinw...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I would like to start a discussion on FLIP-459 Support > Flink > > > hybrid > > > > > > > > shuffle > > > > > > > > > integration with > > > > > > > > > Apache Celeborn[1]. Flink hybrid shuffle supports > transitions > > > > > between > > > > > > > > > memory, disk, and > > > > > > > > > remote storage to improve performance and job stability. > > > > > > Concurrently, > > > > > > > > > Apache Celeborn > > > > > > > > > provides a stable, performant, scalable remote shuffle > > service. > > > > > This > > > > > > > > > integration proposal is to > > > > > > > > > harness the benefits from both hybrid shuffle and Celeborn > > > > > > > > simultaneously. > > > > > > > > > > > > > > > > > > Note that this proposal has two parts. > > > > > > > > > 1. The Flink-side modifications are in FLIP-459[1]. > > > > > > > > > 2. The Celeborn-side changes are in CIP-6[2]. > > > > > > > > > > > > > > > > > > Looking forward to everyone's feedback and suggestions. > Thank > > > you! > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-459*3A*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;JSsrKysrKysr!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9MaOGE7hQ$ > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;KysrKysrKys!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9OJ6v9uGw$ > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Yuxin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >