[jira] [Commented] (FLINK-11805) A Common External Shuffle Service Framework

Flink Jira Bot (Jira) Thu, 22 Apr 2021 15:48:20 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329510#comment-17329510
 ]


Flink Jira Bot commented on FLINK-11805:
----------------------------------------

This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> A Common External Shuffle Service Framework
> -------------------------------------------
>
>                 Key: FLINK-11805
>                 URL: https://issues.apache.org/jira/browse/FLINK-11805
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Network
>            Reporter: MalcolmSanders
>            Assignee: MalcolmSanders
>            Priority: Minor
>              Labels: stale-assigned, stale-minor
>
> In [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. Here I'd like to propose a common external shuffle service framework 
> so that a majority of external shuffle services could be achieved more easily 
> by compositing and wrapping this framework as well as implementing a few 
> interfaces according to the specific platform or deployment system.
> As far as I'm concerned, a common external shuffle service scenario:
> (1) a shuffle service daemon process runs on each host machine as a server to 
> provide shuffle data for remote(maybe local) consumers.
> (2) a producer gets a local persistent output directory for writing shuffle 
> data from the shuffle service daemon process of current host machine, and 
> writes shuffle data afterwards.
> (3) a consumer fetch its subpartition data from the shuffle service daemon on 
> the host machine where the partition locates.
> In my point of view, such framework could be applicable to external shuffle 
> services such as YarnShuffleService, KubernetesShuffleService and 
> StandaloneShuffleService. As to KubernetesShuffleService, there is also 
> another plan, named as sidecar mode, to achieve shuffle service on k8s which 
> puts a TM process and a shuffle service process into a pod. As a result, 
> there might be multiple shuffle service daemons running on a host machine, it 
> can still fit into the framework since the only difference might be whether 
> the port of each shuffle service process is fixed or not. Accroding to 
> [~zjwang]'s 
> [proposal|https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager],
>  this case can be handled via UpdatePartitionInfo so that the actual port of 
> each shuffle service process can be updated to the consumers.
> This framework is not intended to handle external shuffle services which use 
> global storages as the media for shuffle data, such as DfsShuffleService, or 
> other implementations which don't request an actual shuffle service role such 
> as RdmaShuffleService.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11805) A Common External Shuffle Service Framework

Reply via email to