[ https://issues.apache.org/jira/browse/FLINK-12122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015832#comment-17015832 ]
huweihua commented on FLINK-12122: ---------------------------------- [~trohrmann] I have the same issue with [~liuyufei]. We run Flink in per-job mode. We have thousands of jobs that need to be upgraded to Flink 1.9 from Flink 1.5. the change of scheduling strategy cause load balance issue. This blocked our upgrade plan. In addition to the load balance issue, we also encountered other issues caused by Flink 1.9 scheduling strategy. # Network bandwidth. Tasks of the same type are scheduled to the one TaskManager, causing too much network traffic on the machine. # Some jobs need to sink to the local agent. After centralized scheduling, the insufficient processing capacity of the single machine causes a backlog of consumption. I think decentralized scheduling strategy is reasonable. > Spread out tasks evenly across all available registered TaskManagers > -------------------------------------------------------------------- > > Key: FLINK-12122 > URL: https://issues.apache.org/jira/browse/FLINK-12122 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Affects Versions: 1.6.4, 1.7.2, 1.8.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Major > Labels: pull-request-available > Fix For: 1.9.2, 1.10.0 > > Attachments: image-2019-05-21-12-28-29-538.png, > image-2019-05-21-13-02-50-251.png > > Time Spent: 20m > Remaining Estimate: 0h > > With Flip-6, we changed the default behaviour how slots are assigned to > {{TaskManages}}. Instead of evenly spreading it out over all registered > {{TaskManagers}}, we randomly pick slots from {{TaskManagers}} with a > tendency to first fill up a TM before using another one. This is a regression > wrt the pre Flip-6 code. > I suggest to change the behaviour so that we try to evenly distribute slots > across all available {{TaskManagers}} by considering how many of their slots > are already allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005)