[
https://issues.apache.org/jira/browse/FLINK-33960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Fan updated FLINK-33960:
----------------------------
Description:
Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink job
has more than 1 tasks.
When we using the adaptive scheduler and the rescale api, users will set the
lowerBound and upperBound for each job vertices. And users expect the
parallelism of all vertices between lowerBound and upperBound.
But when one flink job has more than 1 vertex, and resource isn't enough. Some
of lowerBound won't be respect.
h2. How to reproduce this bug:
One job has 2 job vertices, we set the resource requirements are:
* Vertex1: lowerBound=2, upperBound=2
* Vertex2: lowerBound=8, upperBound=8
They are same slotSharingGroup, and we only 5 available slots. This job
shouldn't run due to the slots cannot meets the resource requiremnt for vertex2.
But the job can runs, and the parallelism of vertex2 is 5.
h2. Why does this bug happen?
Flink calculates the minimumRequiredSlots for each slot sharing group, it
should be the {color:#FF0000}max{color} lowerBound for all vertices of current
slot sharing group.
But it's using the on the {color:#FF0000}minimum{color} lowerBound.
was:
Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink job
has more than 1 tasks.
When we using the adaptive scheduler and the rescale api, users will set the
lowerBound and upperBound for each job vertices. And users expect the
parallelism of all vertices between lowerBound and upperBound.
But when one flink job has more than 1 vertex, and resource isn't enough. Some
of lowerBound won't be respect.
h2. How to reproduce this bug:
One job has 3 job vertices, we set the resource requirements are:
* Vertex1: lowerBound=2, upperBound=2
* Vertex2: lowerBound=8, upperBound=8
* Vertex3: lowerBound=2, upperBound=2
They are same slotSharingGroup, and we only 5 available slots. This job
shouldn't run due to the slots cannot meets the resource requiremnt for vertex2.
But the job can runs, and the parallelism of vertex2 is 5.
> Adaptive Scheduler doesn't respect the lowerBound for tasks
> -----------------------------------------------------------
>
> Key: FLINK-33960
> URL: https://issues.apache.org/jira/browse/FLINK-33960
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.2, 1.18.1
> Reporter: Rui Fan
> Assignee: Rui Fan
> Priority: Major
>
> Adaptive Scheduler doesn't respect the lowerBound for tasks when one flink
> job has more than 1 tasks.
>
> When we using the adaptive scheduler and the rescale api, users will set the
> lowerBound and upperBound for each job vertices. And users expect the
> parallelism of all vertices between lowerBound and upperBound.
> But when one flink job has more than 1 vertex, and resource isn't enough.
> Some of lowerBound won't be respect.
> h2. How to reproduce this bug:
> One job has 2 job vertices, we set the resource requirements are:
> * Vertex1: lowerBound=2, upperBound=2
> * Vertex2: lowerBound=8, upperBound=8
> They are same slotSharingGroup, and we only 5 available slots. This job
> shouldn't run due to the slots cannot meets the resource requiremnt for
> vertex2.
> But the job can runs, and the parallelism of vertex2 is 5.
>
> h2. Why does this bug happen?
> Flink calculates the minimumRequiredSlots for each slot sharing group, it
> should be the {color:#FF0000}max{color} lowerBound for all vertices of
> current slot sharing group.
> But it's using the on the {color:#FF0000}minimum{color} lowerBound.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)