[jira] [Comment Edited] (FLINK-31976) Once marked as an inefficient scale-up, further scaling may not happen forever

Tan Kim (Jira) Mon, 01 May 2023 10:09:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718283#comment-17718283
 ]


Tan Kim edited comment on FLINK-31976 at 5/1/23 5:08 PM:
---------------------------------------------------------

As you say, even if num records doesn't change, if busytime decreases, true 
processing should increase.
But in the metric, true processing remains the same.
In fact, if you look at the busytime metric, it doesn't decrease in value as 
you say when parallelism increases.
How can I interpret this?

!image-2023-05-02-02-08-25-920.png|width=603,height=270!


was (Author: JIRAUSER300108):
As you say, even if num records doesn't change, if busytime decreases, true 
processing should increase.
But in the metric, true processing remains the same.
In fact, if you look at the busytime metric, it doesn't decrease in value as 
you say when parallelism increases.
How can I interpret this?

!image-2023-05-02-01-56-15-095.png|width=568,height=254!

> Once marked as an inefficient scale-up, further scaling may not happen forever
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-31976
>                 URL: https://issues.apache.org/jira/browse/FLINK-31976
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>    Affects Versions: 1.17.0
>            Reporter: Tan Kim
>            Priority: Major
>         Attachments: image-2023-05-01-22-41-57-208.png, 
> image-2023-05-01-23-54-06-383.png, image-2023-05-01-23-55-08-254.png, 
> image-2023-05-02-01-56-15-095.png, image-2023-05-02-02-08-25-920.png
>
>
> The determination of whether it is an inefficient scale-up is calculated as 
> follows
> {code:java}
> double lastProcRate = 
> lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage();
> double lastExpectedProcRate =
> lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent();
> var currentProcRate = evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();
> double expectedIncrease = lastExpectedProcRate - lastProcRate;
> double actualIncrease = currentProcRate - lastProcRate;
> boolean withinEffectiveThreshold =
> (actualIncrease / expectedIncrease)
> >= conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);{code}
> Because the expectedIncrease value references the last scaling history, it 
> will not change unless there is an additional scale-up, only the 
> actualIncrease value will change.
> The actualIncrease value is currentProcRate( avg of TRUE_PROCESSING_RATE),
> The calculation of TRUE_PROCESSING_RATE is as follows
> trueProcessingRate = busyTimeMultiplier * numRecordsInPerSecond.getSum()
> For example, let's say you've been marked as an inefficient scale-up, but the 
> LAG continues to build up.
> You need to scale up to eliminate the growing LAG, but because you're marked 
> as an inefficient scale-up, it won't happen.
> To unmark a scaleup as inefficient, the following conditions must be met: 
> actualIncrease/expectedIncrease > SCALING_EFFECTIVENESS_THRESHOLD (default 
> 0.1)
> Here, expectedIncrease is a constant with lastSummary, so the value of 
> actualIncrease must increase.
> However, the actualIncrease value is proportional to busyTimeMultiplier and 
> numRecordsInPerSecond, and these two values will converge to a certain value 
> if no scaling occurs.
> Therefore, the value of actualIncrease will also converge.
> If this value fails to cross a threshold, no further scaling up is possible, 
> even if the lag continues to build up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-31976) Once marked as an inefficient scale-up, further scaling may not happen forever

Reply via email to