loyi created FLINK-23190:
----------------------------
Summary: Make task-slot allocation much more evenly
Key: FLINK-23190
URL: https://issues.apache.org/jira/browse/FLINK-23190
Project: Flink
Issue Type: Improvement
Components: Runtime / Task
Affects Versions: 1.12.3
Reporter: loyi
Description:
FLINK-12122 only guarantees spreading out tasks across the set of TMs which are
registered at the time of scheduling, but our jobs are all runing on active
yarn mode, the job with smaller source parallelism offen cause load-balance
issues.
For this job:
{code:java}
// -ys 4 means 10 taskmanagers
env.addSource(...).name("A").setParallelism(10).
map(...).name("B").setParallelism(30)
.map(...).name("C").setParallelism(40)
.addSink(...).name("D").setParallelism(20);
{code}
released-1.12.3 allocation:
||operator||tm1 ||tm2||tm3||tm4||tm5||5m6||tm7||tm8||tm9||tm10||
|A|
1|{color:#de350b}2{color}|{color:#de350b}2{color}|1|1|{color:#de350b}3{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|{color:#de350b}0{color}|
|B|3|3|3|3|3|3|3|3|{color:#de350b}2{color}|{color:#de350b}4{color}|
|C|4|4|4|4|4|4|4|4|4|4|
|D|2|2|2|2|2|{color:#de350b}1{color}|{color:#de350b}1{color}|2|2|{color:#de350b}4{color}|
Suggestions:
When TM register slots to slotManager , we could group the pendingRequests by
their "ExecutionVertexGroup" , then allocate the slots proportionally to each
group.
I have implement a concept version based on release-1.12.3 , the job have fully
evenly task allocation . I want to know if there are other point that have not
been considered ?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)