Hi all,

Thanks for all the feedback and suggestions so far.

The discussion has been going on for some time. If there are no
further comments, we will start voting today.

Best,
Yuxin


Yuxin Tan <tanyuxinw...@gmail.com> 于2023年3月17日周五 12:52写道:

> Hi,
> Thanks for joining the discussion and giving the ideas.
>
> @ron
> > can the Hybrid Shuffle replace the RSS in the future?
>
> The hybrid shuffle and RSS offer distinct solutions to address
> the shuffle operation challenge. To optimize performance, we
> store shuffle data in different tiers of memory and disk, enabling
> greater flexibility and ease of use. Specifically, we cache
> intermediate data in memory to reduce disk I/O overhead.
> In contrast, RSS is a standalone service that can operate across
> multiple servers within a cluster, parallelizing shuffle operations
> to enhance performance. However, this introduces additional
> deployment and maintenance costs. Each approach has its own
> benefits and drawbacks, and users should be able to select the
> method that best suits their needs. So I think we cannot replace
> RSS in the future.
>
> @ConradJam
> > Should we define a data acceleration layer like Alluxio in remote
> storage?
>
> I'm not entirely clear on the detailed plan you've proposed, but I
> understand that you want to use Alluxio to serve as a cache layer for
> the remote stoarge tier. It's designed to provide low-latency data
> access to applications through a distributed caching layer. However,
> implementing Alluxio introduces additional dependencies and
> deployment/maintenance costs for users. While our design approach
> is to supplement local storage with remote storage, as local storage
> is generally sufficient. Given the limited usage scenarios, introducing
> such costs for optimization may not be worthwhile or meaningful.
> Additionally, for users, added dependencies imply increased complexity.
>
> Best,
> Yuxin
>
>
> ConradJam <jam.gz...@gmail.com> 于2023年3月17日周五 11:11写道:
>
>> Thanks for your start this discuss
>>
>>
>> Here I am a bit confused about the memory layer definition. This refers to
>> local memory. Should we define a data acceleration layer like Alluxio [1]
>> in remote storage?
>>
>>
>> Let me cite a scenario: If I use Fluid [2] to mount an AlluxioRuntime [3]
>> on K8S, it looks like a local disk (but it is actually a remote memory
>> storage), Have we specified this behavior or optimized it for this
>> scenario?
>>
>>
>> [1]  What is alluxio :
>> https://docs.alluxio.io/os/user/stable/en/Overview.html
>>
>> [2]  Fluid: https://fluid-cloudnative.github.io/
>>
>> [3]  Fluid Alluxio Runtime:
>>
>> https://fluid-cloudnative.github.io/samples/tieredstore_config.html#prerequisites
>>
>> liu ron <ron9....@gmail.com> 于2023年3月17日周五 10:39写道:
>>
>> > Hi, Yuxin,
>> >
>> > Thanks for creating this FLIP. Adding remote storage capability to
>> Flink's
>> > Hybrid Shuffle is a significant improvement that addresses the issue of
>> > local disk storage limitations, this also can improve the stability of
>> > Flink Batch Job.
>> > I just have one question: can the Hybrid Shuffle replace the RSS in the
>> > future? Due to the Hybrid Shuffle having remote storage ability, I think
>> > maybe we don't need to maintain a standalone RSS, it will simplify our
>> > operation work.
>> >
>>
>>
>> --
>> Best
>>
>> ConradJam
>>
>

Reply via email to