[jira] [Closed] (FLINK-11805) A Common External Shuffle Service Framework

Flink Jira Bot (Jira) Thu, 20 May 2021 03:53:39 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-11805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Flink Jira Bot closed FLINK-11805.
----------------------------------
    Resolution: Auto Closed

This issue was labeled "stale-minor" 7 ago and has not received any updates so 
I have gone ahead and closed it.  If you are still affected by this or would 
like to raise the priority of this ticket please re-open, removing the label 
"auto-closed" and raise the ticket priority accordingly.


> A Common External Shuffle Service Framework
> -------------------------------------------
>
>                 Key: FLINK-11805
>                 URL: https://issues.apache.org/jira/browse/FLINK-11805
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Network
>            Reporter: MalcolmSanders
>            Assignee: MalcolmSanders
>            Priority: Minor
>              Labels: auto-closed, stale-assigned
>
> In [FLINK-10653|https://issues.apache.org/jira/browse/FLINK-10653], [~zjwang] 
> has introduced pluggable shuffle manager architecture which abstracts the 
> process of data transfer between stages from flink runtime as shuffle 
> service. Here I'd like to propose a common external shuffle service framework 
> so that a majority of external shuffle services could be achieved more easily 
> by compositing and wrapping this framework as well as implementing a few 
> interfaces according to the specific platform or deployment system.
> As far as I'm concerned, a common external shuffle service scenario:
> (1) a shuffle service daemon process runs on each host machine as a server to 
> provide shuffle data for remote(maybe local) consumers.
> (2) a producer gets a local persistent output directory for writing shuffle 
> data from the shuffle service daemon process of current host machine, and 
> writes shuffle data afterwards.
> (3) a consumer fetch its subpartition data from the shuffle service daemon on 
> the host machine where the partition locates.
> In my point of view, such framework could be applicable to external shuffle 
> services such as YarnShuffleService, KubernetesShuffleService and 
> StandaloneShuffleService. As to KubernetesShuffleService, there is also 
> another plan, named as sidecar mode, to achieve shuffle service on k8s which 
> puts a TM process and a shuffle service process into a pod. As a result, 
> there might be multiple shuffle service daemons running on a host machine, it 
> can still fit into the framework since the only difference might be whether 
> the port of each shuffle service process is fixed or not. Accroding to 
> [~zjwang]'s 
> [proposal|https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager],
>  this case can be handled via UpdatePartitionInfo so that the actual port of 
> each shuffle service process can be updated to the consumers.
> This framework is not intended to handle external shuffle services which use 
> global storages as the media for shuffle data, such as DfsShuffleService, or 
> other implementations which don't request an actual shuffle service role such 
> as RdmaShuffleService.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-11805) A Common External Shuffle Service Framework

Reply via email to