Hello Yuxin,

    Thanks for your proposal! Adding remote storage capability to Flink's 
Hybrid Shuffle is a significant improvement that addresses the issue of local 
disk storage limitations. This enhancement not only ensures uninterrupted 
Shuffle, but also enables Flink to handle larger workloads and more complex 
data processing tasks. With the ability to seamlessly shift between local and 
remote storage, Flink's Hybrid Shuffle will be more versatile and scalable, 
making it an ideal choice for organizations looking to build distributed data 
processing applications with ease.
    Besides, I've a small question about the size of Segment in different 
storages. According to the FLIP, the size of Segment may be fixed for each 
Storage Tier, but I think the fixed size may affect the shuffle performance. 
For example, smaller segment size will improve the utilization rate of Memory 
Storage Tier, but it may brings extra cost to Disk Storage Tier or Remote 
Storage Tier. Deciding the size of Segment dynamicly will be helpful.
 
Best,


Wencong Liu



















At 2023-03-06 13:51:21, "Yuxin Tan" <tanyuxinw...@gmail.com> wrote:
>Hi everyone,
>
>I would like to start a discussion on FLIP-301: Hybrid Shuffle supports
>Remote Storage[1].
>
>In the cloud-native environment, it is difficult to determine the
>appropriate
>disk space for Batch shuffle, which will affect job stability.
>
>This FLIP is to support Remote Storage for Hybrid Shuffle to improve the
>Batch job stability in the cloud-native environment.
>
>The goals of this FLIP are as follows.
>1. By default, use the local memory and disk to ensure high shuffle
>performance if the local storage space is sufficient.
>2. When the local storage space is insufficient, use remote storage as
>a supplement to avoid large-scale Batch job failure.
>
>Looking forward to hearing from you.
>
>[1]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
>
>Best,
>Yuxin

Reply via email to