[ 
https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015832#comment-17015832
 ] 

huweihua commented on FLINK-12122:
----------------------------------

[~trohrmann] I have the same issue with [~liuyufei]. 

We run Flink in per-job mode. We have thousands of jobs that need to be 
upgraded to Flink 1.9 from Flink 1.5. the change of scheduling strategy cause 
load balance issue. This blocked our upgrade plan.
In addition to the load balance issue, we also encountered other issues caused 
by Flink 1.9 scheduling strategy. # Network bandwidth. Tasks of the same type 
are scheduled to the one TaskManager, causing too much network traffic on the 
machine.

 # Some jobs need to sink to the local agent. After centralized scheduling, the 
insufficient processing capacity of the single machine causes a backlog of 
consumption.

I think decentralized scheduling strategy is reasonable. 

> Spread out tasks evenly across all available registered TaskManagers
> --------------------------------------------------------------------
>
>                 Key: FLINK-12122
>                 URL: https://issues.apache.org/jira/browse/FLINK-12122
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.6.4, 1.7.2, 1.8.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.2, 1.10.0
>
>         Attachments: image-2019-05-21-12-28-29-538.png, 
> image-2019-05-21-13-02-50-251.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> With Flip-6, we changed the default behaviour how slots are assigned to 
> {{TaskManages}}. Instead of evenly spreading it out over all registered 
> {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a 
> tendency to first fill up a TM before using another one. This is a regression 
> wrt the pre Flip-6 code.
> I suggest to change the behaviour so that we try to evenly distribute slots 
> across all available {{TaskManagers}} by considering how many of their slots 
> are already allocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to