Hi Nicholas,

Thanks for the valuable feedbacks.

> 1.  Could you describe in detail what functions the relevant components
mentioned in Proposed Changes

These components are only the pluggable implementations of the Celeborn
tier.
The details and the mechanisms of switching between tiers are in the
previous
FLIP[1]. The Celeborn, as a new tier, is added to hybrid shuffle, sharing
the
similarities with existing tiers, such as the Memory tier and Disk tier. In
this tiered
storage, agents serve as the entry points of interaction between the
framework
and different tiers. For instance, CelebornProducerAgent acts as the entry
point
for producers to emit data into the tier. If there are still more similar
questions
after referencing that FLIP, please feel free to let me know.

> 2. Can you briefly introduce how to guarantee compatibility with
Celeborn’s
existing features such as partition splitting?

This integration work is a new way to make Celeborn work with Flink, so the
compatibility of the old shuffle service mode is not affected. The new
integration
will also support the features of the old mode, e.g., the partition split
will be
supported by trying to open the stream from the next partition when the
previous
partition is read completely. Since these features are all implementation
details,
initially I didn't add them in the CIP to keep it focused, simple, and easy
to
understand. After the question, I have added some feature details to it.

> 3. Is there any public configuration of integration with Hybrid Shuffle
and Flink
client?

Yes, there is an added Flink configuration, which is described in the
FLIP[2].


> 4. How does the server side guarantee the accuracy and recoverability of
Segment information?

Similar to other writing information, the segment info is also added to
FileInfo.
and the lock can protect it to guarantee accuracy. The recoverability is
achieved
by serialization and deserialization, which is also the same as other
fields.

> 5. Should Celeborn wait until FLIP-459 is released before releasing this
integration? Which Flink version will release FLIP-459?

Celeborn's integration should wait for FLIP-459 to be released. This is
because
the feature relies on both CIP-6 and FLIP-459 to function correctly. If all
goes well,
FLIP-459 could be part of Flink's next release, Flink 1.20.


Hi, Keyong,

Thanks for the reminder and the interest in the Reduce Partition. After the
Map
Partition part is finished, we will continue to work on it as soon as
possible.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn

Best,
Yuxin


Keyong Zhou <[email protected]> 于2024年6月8日周六 13:00写道:

> Hi Yuxin and Xintong,
>
> Really excited to see Flink and Celeborn communities collaborate
> more on shuffle component! I believe this will inspire more for both sides
> :)
>
> +1 for this proposal, looking forward to see this feature to make progress.
>
> Also I'm very interested in integrating Flink Hybrid Shuffle with
> Celeborn's
> Reduce Partition as mentioned in the doc in the future, which I believe
> will
> benefit more for very large shuffle operators :)
>
> Regards,
> Keyong Zhou
>
> Nicholas Jiang <[email protected]> 于2024年6月6日周四 13:25写道:
>
> > Hi Yuxin,
> >
> > Thanks for driving this CIP about integration with Hybrid Shuffle. I have
> > some comments on this CIP:
> >
> > 1. Could you describe in detail what functions the relevant components
> > mentioned in Proposed Changes, including CelebornProducerAgent,
> > CelebornConsumerAgent, CelebornMasterAgent, etc., support? In the design
> > document, these components are only mentioned and no any details of
> changes.
> >
> > 2. Can you briefly introduce how to guarantee compatibility with
> > Celeborn’s existing features such as partition splitting? IMO, the
> > compatibility introduction should be mentioned in Proposed Changes to
> help
> > community developers understand.
> >
> > 3. There are no changes on public interfaces. Is there any public
> > configuration of integration with Hybrid Shuffle and Flink client?
> >
> > 4. The server side must store Segment information for each subpartition.
> > How does the server side guarantee the accuracy and recoverability of
> > Segment information?
> >
> > 5. Should Celeborn wait until FLIP-459 is released before releasing this
> > integration? Which Flink version will release FLIP-459?
> >
> > Regards,
> > Nicholas Jiang
> >
> > On 2024/05/28 12:51:32 Yuxin Tan wrote:
> > > Hi all,
> > >
> > > I would like to start a discussion on CIP-6 Support Flink hybrid
> shuffle
> > > integration with Apache
> > > Celeborn[1]. Celeborn provides a stable, performant, scalable remote
> > > shuffle service.
> > > Concurrently, Flink hybrid shuffle supports transitions between memory,
> > > disk, and remote
> > > storage to improve performance and job stability. This integration
> > proposal
> > > is to harness the
> > > benefits from both Celeborn and hybrid shuffle simultaneously.
> > >
> > > Note that this proposal has two parts.
> > > 1. The Celeborn-side changes are in CIP-6[1].
> > > 2. The Flink-side modifications are in FLIP-459[2].
> > >
> > > Looking forward to everyone's feedback and suggestions. Thank you!
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> > >
> > > Best,
> > > Yuxin
> > >
> >
>

Reply via email to