[ 
https://issues.apache.org/jira/browse/FLINK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-33960:
----------------------------
    Description: 
Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink job 
has more than 1 tasks.

 

When we using the adaptive scheduler and the rescale api, users will set the 
lowerBound and upperBound for each job vertices. And users expect the 
parallelism of all vertices between lowerBound and upperBound.

But when one flink job  has more than 1 vertex, and resource isn't enough. Some 
of lowerBound won't be respect.
h2. How to reproduce this bug:

One job has 2 job vertices, we set the resource requirements are:
 * Vertex1: lowerBound=2, upperBound=2
 * Vertex2: lowerBound=8, upperBound=8

They are same slotSharingGroup, and we only 5 available slots. This job 
shouldn't run due to the slots cannot meets the resource requiremnt for vertex2.

But the job can runs, and the parallelism of vertex2 is 5.

 
h2. Why does this  bug happen?

Flink calculates the minimumRequiredSlots for each slot sharing group, it 
should be the {color:#FF0000}max{color} lowerBound for all vertices of current 
slot sharing group.

But it's using the on the {color:#FF0000}minimum{color} lowerBound.

  was:
Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink job 
has more than 1 tasks.

 

When we using the adaptive scheduler and the rescale api, users will set the 
lowerBound and upperBound for each job vertices. And users expect the 
parallelism of all vertices between lowerBound and upperBound.

But when one flink job  has more than 1 vertex, and resource isn't enough. Some 
of lowerBound won't be respect.
h2. How to reproduce this bug:

One job has 3 job vertices, we set the resource requirements are:
 * Vertex1: lowerBound=2, upperBound=2
 * Vertex2: lowerBound=8, upperBound=8
 * Vertex3: lowerBound=2, upperBound=2

They are same slotSharingGroup, and we only 5 available slots. This job 
shouldn't run due to the slots cannot meets the resource requiremnt for vertex2.

But the job can runs, and the parallelism of vertex2 is 5.


> Adaptive Scheduler doesn't respect the lowerBound for tasks
> -----------------------------------------------------------
>
>                 Key: FLINK-33960
>                 URL: https://issues.apache.org/jira/browse/FLINK-33960
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.17.2, 1.18.1
>            Reporter: Rui Fan
>            Assignee: Rui Fan
>            Priority: Major
>
> Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink 
> job has more than 1 tasks.
>  
> When we using the adaptive scheduler and the rescale api, users will set the 
> lowerBound and upperBound for each job vertices. And users expect the 
> parallelism of all vertices between lowerBound and upperBound.
> But when one flink job  has more than 1 vertex, and resource isn't enough. 
> Some of lowerBound won't be respect.
> h2. How to reproduce this bug:
> One job has 2 job vertices, we set the resource requirements are:
>  * Vertex1: lowerBound=2, upperBound=2
>  * Vertex2: lowerBound=8, upperBound=8
> They are same slotSharingGroup, and we only 5 available slots. This job 
> shouldn't run due to the slots cannot meets the resource requiremnt for 
> vertex2.
> But the job can runs, and the parallelism of vertex2 is 5.
>  
> h2. Why does this  bug happen?
> Flink calculates the minimumRequiredSlots for each slot sharing group, it 
> should be the {color:#FF0000}max{color} lowerBound for all vertices of 
> current slot sharing group.
> But it's using the on the {color:#FF0000}minimum{color} lowerBound.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to