Hi Yunxin, Thanks a lot for the CIP, +1 For me the whole design and implementation appear clearer and have no compatibility issues. My only concern is whether the change for the Celeborn worker supports a graceful shutdown/decommission. Could you provide more details on that?
Thanks, Jiashu Xiong Xintong Song <[email protected]> 于2024年5月29日周三 09:39写道: > +1 for this proposal. > > Greetings to the Apache Celeborn community~! Yuxin and I are from the > Apache Flink community, and have been working on the shuffle related > components for years. We are both excited about making our first > contribution to the Apache Celeborn community. > > Hybrid Shuffle is a new shuffle architecture that the Flink community has > been working on for ~2 years. We are planning to make it the default (and > eventually the only) batch shuffle in the Flink 2.0 release (end of this > year). The architecture is flexible and extensible so that it can support > all the capabilities of existing shuffle modes, while providing new > advantages on task scheduling, resource efficiency and usability. To > achieve this, we abstract storages (memory, local dist, remote storage / > service) into Tiers, and hide details such as assembling records to > buffers, dynamic switching between Tiers and memory management from the > Tiers. > > We believe it is important that Flink and Celeborn can be integrated under > the new architecture, in addition to the existing integration based on the > shuffle-service interfaces. > > Looking forward to your feedback. > > Best, > > Xintong > > > > On Tue, May 28, 2024 at 8:52 PM Yuxin Tan <[email protected]> wrote: > > > Hi all, > > > > I would like to start a discussion on CIP-6 Support Flink hybrid shuffle > > integration with Apache > > Celeborn[1]. Celeborn provides a stable, performant, scalable remote > > shuffle service. > > Concurrently, Flink hybrid shuffle supports transitions between memory, > > disk, and remote > > storage to improve performance and job stability. This integration > proposal > > is to harness the > > benefits from both Celeborn and hybrid shuffle simultaneously. > > > > Note that this proposal has two parts. > > 1. The Celeborn-side changes are in CIP-6[1]. > > 2. The Flink-side modifications are in FLIP-459[2]. > > > > Looking forward to everyone's feedback and suggestions. Thank you! > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn > > > > Best, > > Yuxin > > >
