Yuxin, One question, in the current proposal is it limited to only one tiered storage implementation for eg: celeborn? Is it possible to have multiple tiered storages like a separate cloud storage and Celeborn or some such?
On Thu, Jun 6, 2024, 9:46 AM Jeyhun Karimov <je.kari...@gmail.com> wrote: > Hi Yuxin, > > +1 for this proposal. > This change will greatly alleviate the pressure on local storage resources > (especially when there is limited local storage) > particularly in the context of cloud-native environments. > > Regards, > Jeyhun > > On Thu, Jun 6, 2024 at 1:20 PM Yuxin Tan <tanyuxinw...@gmail.com> wrote: > > > Hi all, > > > > Thanks for all the feedback and suggestions so far. > > > > If there is no further comment, we will open the voting thread tomorrow. > > > > Best, > > Yuxin > > > > > > Yuxin Tan <tanyuxinw...@gmail.com> 于2024年6月6日周四 15:40写道: > > > > > Thanks Zhu for the suggestion. > > > I have updated the description of the option. > > > > > > Best, > > > Yuxin > > > > > > > > > Zhu Zhu <reed...@gmail.com> 于2024年6月6日周四 14:59写道: > > > > > >> +1 > > >> > > >> Maybe explain in the description of > > >> > `taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class` > > >> that it only accepts Celeborn as the remote shuffle tier at this > moment? > > >> > > >> Thanks, > > >> Zhu > > >> > > >> Junrui Lee <jrlee....@gmail.com> 于2024年6月6日周四 13:49写道: > > >> > > >> > Thanks Yuxin for your answer. +1 for this proposal. > > >> > > > >> > Best, > > >> > Junrui. > > >> > > > >> > Yuxin Tan <tanyuxinw...@gmail.com> 于2024年6月6日周四 13:42写道: > > >> > > > >> > > Thanks Junrui for your question. > > >> > > > > >> > > > I wonder if the current interface design support the > > >> > > future adaptation for batch job recovery > > >> > > > > >> > > I noticed that FLIP-383 supports batch job recovery by introducing > > >> > > some new APIs. These APIs can also be added to the Tier-related > > >> > > interfaces to facilitate the feature. Since these modifications > are > > >> not > > >> > > directly related to the current integration tasks and the > > integration > > >> > > does not conflict with the batch job recovery, I propose that this > > >> FLIP > > >> > > doesn't involve these particular changes. Moreover, considering > that > > >> > > the Tier interfaces are not public currently, it is also feasible > to > > >> add > > >> > > the interfaces directly if necessary. > > >> > > WDYT? > > >> > > > > >> > > Best, > > >> > > Yuxin > > >> > > > > >> > > > > >> > > Junrui Lee <jrlee....@gmail.com> 于2024年6月6日周四 11:02写道: > > >> > > > > >> > > > Thanks Yuxin for driving this proposal! > > >> > > > > > >> > > > I have a question about the public interface compatibility in > the > > >> > context > > >> > > > of FLIP-459. As we've supported batch job recovery from > jobMaster > > >> > > failures > > >> > > > in FLIP-383 which will be released in Flink 1.20. I wonder if > the > > >> > current > > >> > > > interface design support the future adaptation for batch job > > >> recovery? > > >> > > > > > >> > > > Looking forward to your feedback. > > >> > > > > > >> > > > Best, > > >> > > > Junrui. > > >> > > > > > >> > > > weijie guo <guoweijieres...@gmail.com> 于2024年6月5日周三 10:13写道: > > >> > > > > > >> > > > > Thanks Yuxin for the proposal! > > >> > > > > > > >> > > > > When we first proposed Hybrid Shuffle, I wanted to support > > >> pluggable > > >> > > > > storage tier in the future. However, limited by the > architecture > > >> of > > >> > the > > >> > > > > legacy Hybrid Shuffle at that time, this idea has not been > > >> realized. > > >> > > The > > >> > > > > new architecture abstracts the tier nicely, and now it's time > to > > >> > > > introduce > > >> > > > > support for external storage. > > >> > > > > > > >> > > > > Big +1 for this one! > > >> > > > > > > >> > > > > Best regards, > > >> > > > > > > >> > > > > Weijie > > >> > > > > > > >> > > > > > > >> > > > > rexxiong <rexxi...@apache.org> 于2024年6月5日周三 00:08写道: > > >> > > > > > > >> > > > > > Thanks Yuxin for the proposal. +1, as a member of the > Apache > > >> > > Celeborn > > >> > > > > > community, I am very excited about the integration of > Flink's > > >> > Hybrid > > >> > > > > > Shuffle with Apache Celeborn. The whole design of CIP-6 > looks > > >> good > > >> > to > > >> > > > > me. I > > >> > > > > > am looking forward to this integration. > > >> > > > > > > > >> > > > > > Thanks, > > >> > > > > > Jiashu Xiong > > >> > > > > > > > >> > > > > > Ethan Feng <ethanf...@apache.org> 于2024年6月4日周二 16:47写道: > > >> > > > > > > > >> > > > > > > +1 for this proposal. > > >> > > > > > > > > >> > > > > > > After internally reviewing the prototype of CIP-6, this > > would > > >> > > improve > > >> > > > > > > performance and stability for Flink users using Celeborn. > > >> > > > > > > > > >> > > > > > > Expect to see this feature come out to the community. > > >> > > > > > > > > >> > > > > > > As I come from the Celeborn community, I hope more users > can > > >> try > > >> > to > > >> > > > > > > use Celeborn when there are Flink batch jobs. > > >> > > > > > > > > >> > > > > > > Thanks, > > >> > > > > > > Ethan Feng > > >> > > > > > > > > >> > > > > > > Yuxin Tan <tanyuxinw...@gmail.com> 于2024年6月4日周二 16:34写道: > > >> > > > > > > > > > >> > > > > > > > Hi, Venkatakrishnan, > > >> > > > > > > > > > >> > > > > > > > Thanks for joining the discussion. We appreciate your > > >> interest > > >> > > > > > > > in contributing to the work. Once the FLIP and CIP > > proposals > > >> > > > > > > > have been approved, we will create some JIRA tickets in > > >> Flink > > >> > > > > > > > and Celeborn projects. Please feel free to take a look > at > > >> the > > >> > > > > > > > tickets and select any that resonate with your > interests. > > >> > > > > > > > > > >> > > > > > > > Best, > > >> > > > > > > > Yuxin > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> > > 于2024年5月31日周五 > > >> > > > 23:11写道: > > >> > > > > > > > > > >> > > > > > > > > Thanks for this FLIP. We are also interested in > > >> > > > > learning/contributing > > >> > > > > > > to > > >> > > > > > > > > the hybrid shuffle integration with celeborn for batch > > >> > > > executions. > > >> > > > > > > > > > > >> > > > > > > > > On Tue, May 28, 2024, 7:07 PM Yuxin Tan < > > >> > > tanyuxinw...@gmail.com> > > >> > > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > > Hi, Xintong, > > >> > > > > > > > > > > > >> > > > > > > > > > > I think we can also publish the prototype codes > so > > >> the > > >> > > > > > > > > > community can better understand and help with it. > > >> > > > > > > > > > > > >> > > > > > > > > > Ok, I agree on the point. I will prepare and publish > > the > > >> > code > > >> > > > > > > > > > recently. > > >> > > > > > > > > > > > >> > > > > > > > > > Rui, > > >> > > > > > > > > > > > >> > > > > > > > > > > Kindly reminder: the image of CIP-6[1] cannot be > > >> loaded. > > >> > > > > > > > > > > > >> > > > > > > > > > Thanks for the reminder. I've updated the images. > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > Best, > > >> > > > > > > > > > Yuxin > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > Rui Fan <1996fan...@gmail.com> 于2024年5月29日周三 > 09:33写道: > > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks Yuxin for driving this proposal! > > >> > > > > > > > > > > > > >> > > > > > > > > > > Kindly reminder: the image of CIP-6[1] cannot be > > >> loaded. > > >> > > > > > > > > > > > > >> > > > > > > > > > > [1] > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;KysrKysrKys!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9OJ6v9uGw$ > > >> > > > > > > > > > > > > >> > > > > > > > > > > Best, > > >> > > > > > > > > > > Rui > > >> > > > > > > > > > > > > >> > > > > > > > > > > On Wed, May 29, 2024 at 9:03 AM Xintong Song < > > >> > > > > > > tonysong...@gmail.com> > > >> > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > >> > > > > > > > > > > > +1 for this proposal. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > We have been prototyping this feature internally > > at > > >> > > Alibaba > > >> > > > > > for a > > >> > > > > > > > > > couple > > >> > > > > > > > > > > of > > >> > > > > > > > > > > > months. Yuxin, I think we can also publish the > > >> > prototype > > >> > > > > codes > > >> > > > > > > so the > > >> > > > > > > > > > > > community can better understand and help with > it. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Xintong > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > On Tue, May 28, 2024 at 8:34 PM Yuxin Tan < > > >> > > > > > > tanyuxinw...@gmail.com> > > >> > > > > > > > > > > wrote: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Hi all, > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > I would like to start a discussion on FLIP-459 > > >> > Support > > >> > > > > Flink > > >> > > > > > > hybrid > > >> > > > > > > > > > > > shuffle > > >> > > > > > > > > > > > > integration with > > >> > > > > > > > > > > > > Apache Celeborn[1]. Flink hybrid shuffle > > supports > > >> > > > > transitions > > >> > > > > > > > > between > > >> > > > > > > > > > > > > memory, disk, and > > >> > > > > > > > > > > > > remote storage to improve performance and job > > >> > > stability. > > >> > > > > > > > > > Concurrently, > > >> > > > > > > > > > > > > Apache Celeborn > > >> > > > > > > > > > > > > provides a stable, performant, scalable remote > > >> > shuffle > > >> > > > > > service. > > >> > > > > > > > > This > > >> > > > > > > > > > > > > integration proposal is to > > >> > > > > > > > > > > > > harness the benefits from both hybrid shuffle > > and > > >> > > > Celeborn > > >> > > > > > > > > > > > simultaneously. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Note that this proposal has two parts. > > >> > > > > > > > > > > > > 1. The Flink-side modifications are in > > >> FLIP-459[1]. > > >> > > > > > > > > > > > > 2. The Celeborn-side changes are in CIP-6[2]. > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Looking forward to everyone's feedback and > > >> > suggestions. > > >> > > > > Thank > > >> > > > > > > you! > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > [1] > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-459*3A*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;JSsrKysrKysr!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9MaOGE7hQ$ > > >> > > > > > > > > > > > > [2] > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6*Support*Flink*hybrid*shuffle*integration*with*Apache*Celeborn__;KysrKysrKys!!IKRxdwAv5BmarQ!ZRTc1aUSYMDBazuIwlet1Dzk2_DD9qKTgoDLH9jSwAVLgwplcuId_8JoXkH0i7AeWxKWXkL0sxM3AeW-H9OJ6v9uGw$ > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > Best, > > >> > > > > > > > > > > > > Yuxin > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >