Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

Junrui Lee Tue, 07 Mar 2023 08:24:13 -0800

Hi Yuxin,

This FLIP looks quite reasonable. Flink can solve the problem of Batch
shuffle by
combining local and remote storage, and can use fixed local disks for
better performance
 in most scenarios, while using remote storage as a supplement when local
disks are not
 sufficient, avoiding wasteful costs and poor job stability. Moreover, the
solution also
considers the issue of dynamic switching, which can automatically switch to
remote
storage when the local disk is full, saving costs, and automatically switch
back when
there is available space on the local disk.


As Wencong Liu stated, an appropriate segment size is essential, as it can
significantly
affect shuffle performance. I also agree that the first version should
focus mainly on the
design and implementation. However, I have a small question about FLIP. I
did not see
any information regarding the segment size of memory, local disk, and
remote storage
in this FLIP. Are these three values fixed at present? If they are fixed, I
suggest that FLIP
could provide clearer explanations. Moreover, although a dynamic segment
size
mechanism is not necessary at the moment, can we provide configuration
options for users
 to manually adjust these sizes? I think it might be useful.

Best,
Junrui.

Yuxin Tan <tanyuxinw...@gmail.com> 于2023年3月7日周二 20:14写道：

> Thanks for joining the discussion.
>
> @weijie guo
> > 1. How to optimize the broadcast result partition?
> For the partitions with multi-consumers, e.g., broadcast result partition,
> partition reuse,
> speculative, etc, the processing logic is the same as the original Hybrid
> Shuffle, that is,
> using the full spilling strategy. It indeed may reduce the opportunity to
> consume from
> memory, but the PoC shows that it has no effect on the performance
> basically.
>
> > 2. Can the new proposal completely avoid this problem of inaccurate
> backlog
> calculation?
> Yes, this can avoid the problem completely. About the read buffers, the N
> is to reserve
> one exclusive buffer per channel, which is to avoid the deadlock because
> the buffers
> are acquired by some channels and other channels can not request any
> buffers. But
> the buffers except for the N can be floating (competing to request the
> buffers) by all
> channels.
>
> @Wencong Liu
> > Deciding the Segment size dynamically will be helpful.
> I agree that it may be better if the segment size is dynamically decided,
> but for simplifying
> the implementation of the first version, we want to make this a fixed value
> for each tier.
> In the future, this can be a good improvement if necessary. In the first
> version, we will mainly
> focus on the more important features, such as the tiered storage
> architecture, dynamic
> switching tiers, supporting remote storage, memory management, etc.
>
> Best,
> Yuxin
>
>
> Wencong Liu <liuwencle...@163.com> 于2023年3月7日周二 16:48写道：
>
> > Hello Yuxin,
> >
> >
> >     Thanks for your proposal! Adding remote storage capability to Flink's
> > Hybrid Shuffle is a significant improvement that addresses the issue of
> > local disk storage limitations. This enhancement not only ensures
> > uninterrupted Shuffle, but also enables Flink to handle larger workloads
> > and more complex data processing tasks. With the ability to seamlessly
> > shift between local and remote storage, Flink's Hybrid Shuffle will be
> more
> > versatile and scalable, making it an ideal choice for organizations
> looking
> > to build distributed data processing applications with ease.
> >     Besides, I've a small question about the size of Segment in different
> > storages. According to the FLIP, the size of Segment may be fixed for
> each
> > Storage Tier, but I think the fixed size may affect the shuffle
> > performance. For example, smaller segment size will improve the
> utilization
> > rate of Memory Storage Tier, but it may brings extra cost to Disk Storage
> > Tier or Remote Storage Tier. Deciding the size of Segment dynamicly will
> be
> > helpful.
> >
> > Best,
> >
> >
> > Wencong Liu
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2023-03-06 13:51:21, "Yuxin Tan" <tanyuxinw...@gmail.com> wrote:
> > >Hi everyone,
> > >
> > >I would like to start a discussion on FLIP-301: Hybrid Shuffle supports
> > >Remote Storage[1].
> > >
> > >In the cloud-native environment, it is difficult to determine the
> > >appropriate
> > >disk space for Batch shuffle, which will affect job stability.
> > >
> > >This FLIP is to support Remote Storage for Hybrid Shuffle to improve the
> > >Batch job stability in the cloud-native environment.
> > >
> > >The goals of this FLIP are as follows.
> > >1. By default, use the local memory and disk to ensure high shuffle
> > >performance if the local storage space is sufficient.
> > >2. When the local storage space is insufficient, use remote storage as
> > >a supplement to avoid large-scale Batch job failure.
> > >
> > >Looking forward to hearing from you.
> > >
> > >[1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
> > >
> > >Best,
> > >Yuxin
> >
>

Re: [DISCUSS] FLIP-301: Hybrid Shuffle supports Remote Storage

Reply via email to