Rui Fan created FLINK-36863:
-------------------------------
Summary: Use the maximum parallelism in the past
scale-down.interval window when scaling down
Key: FLINK-36863
URL: https://issues.apache.org/jira/browse/FLINK-36863
Project: Flink
Issue Type: Bug
Components: Autoscaler
Reporter: Rui Fan
Assignee: Rui Fan
FLINK-36535 uses the maximum parallelism since the scale down trigger when
scaling down. Because VertexDelayedScaleDownInfo only stored the
maxRecommendedParallelism.
It's better to use the maximum parallelism in the {color:#de350b}past
scale-down.interval window{color}.
h1. Reason:
Assuming current parallelism is 100, and scale down interval is 1 hour, what's
difference between them?
Following is the recommended parallelism at the different time:
* 2024-12-09 00:00:00 -> 99 (trigger scale down)
* 2024-12-09 00:30:00 -> 90
* 2024-12-09 01:00:00 -> 80
* 2024-12-09 01:30:00 -> 70
* 2024-12-09 02:00:00 -> 60
* 2024-12-09 02:30:00 -> 50
* 2024-12-09 03:00:00 -> 40
For the current code in the main branch, the 99 will be as the final
parallelism at 2024-12-09 03:30:00 since we take the maxRecommendedParallelism
from VertexDelayedScaleDownInfo.
But it has a bug here: 99 is closer with current parallelism (100), so the
recommended parallelism is always within the utilization range. So job or task
never scale down.
But we should use 50 as the final parallelism at 2024-12-09 03:30:00, because
50 is the max parallelism in the past 1 hour. And 50 is not within the
utilization range, scale down could be executed.
h1. Solution:
VertexDelayedScaleDownInfo maintain all recommended parallelisms at each time
within the past scale-down.interval window period.
* Evicts the recommended parallelism before the scale-down.interval window.
* The max parallelism within the window range as the final parallelism.
Note: It is a scenario that calculates the max value within a sliding window.
* It is similar with leetcode 239.
* If latest parallelism is greater than the past parallelism, the past
parallelism never be the max value, so we could evict the past value.
* We only need to maintain a list with monotonically decreasing parallelism
within the past window.
* The first parallelism is the final parallelism.
h1. Note:
This proposal is exactly what FLINK-36535 change1 expects. But I was not aware
of this bug during my development. Sorry for that. :(
* {color:#de350b}Change1{color}: Using the maximum parallelism within the
window instead of the latest parallelism when scaling down.
[1]
[https://github.com/apache/flink-kubernetes-operator/blob/d9e8cce85499f26ac0129a2f2d13a083d68b5c21/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/DelayedScaleDown.java#L42]
[2] [https://leetcode.com/problems/sliding-window-maximum/description/]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)