[jira] [Commented] (FLINK-31976) Once marked as an inefficient scale-up, further scaling may not happen forever

Tan Kim (Jira) Mon, 01 May 2023 06:51:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718219#comment-17718219
 ]


Tan Kim commented on FLINK-31976:
---------------------------------

As you say, this could be improved if we could trim the history from the GET 
operation.

However, it doesn't seem to provide a fundamental workaround.
Below are some scaling-related metrics for situations where the lag is 
increasing, but is marked as ineffective, preventing further scaling.

!image-2023-05-01-22-41-57-208.png|width=655,height=327!

Under normal circumstances (roughly between 09:00 and 10:00), the value of 
TRUE_PROCESSING_RATE_AVG is located between SCALE_UP/DOWN_THRESHOLD. However, 
at around 10:00, after the PARALLELISM value increases from 1 to 2, the 
TRUE_PROCESSING_RATE_AVG value remains significantly below the 
SCALE_UP_THRESHOLD, marking it as ineffective and not continuing to scale.

> Once marked as an inefficient scale-up, further scaling may not happen forever
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-31976
>                 URL: https://issues.apache.org/jira/browse/FLINK-31976
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler
>    Affects Versions: 1.17.0
>            Reporter: Tan Kim
>            Priority: Major
>         Attachments: image-2023-05-01-22-41-57-208.png
>
>
> The determination of whether it is an inefficient scale-up is calculated as 
> follows
> {code:java}
> double lastProcRate = 
> lastSummary.getMetrics().get(TRUE_PROCESSING_RATE).getAverage();
> double lastExpectedProcRate =
> lastSummary.getMetrics().get(EXPECTED_PROCESSING_RATE).getCurrent();
> var currentProcRate = evaluatedMetrics.get(TRUE_PROCESSING_RATE).getAverage();
> double expectedIncrease = lastExpectedProcRate - lastProcRate;
> double actualIncrease = currentProcRate - lastProcRate;
> boolean withinEffectiveThreshold =
> (actualIncrease / expectedIncrease)
> >= conf.get(AutoScalerOptions.SCALING_EFFECTIVENESS_THRESHOLD);{code}
> Because the expectedIncrease value references the last scaling history, it 
> will not change unless there is an additional scale-up, only the 
> actualIncrease value will change.
> The actualIncrease value is currentProcRate( avg of TRUE_PROCESSING_RATE),
> The calculation of TRUE_PROCESSING_RATE is as follows
> trueProcessingRate = busyTimeMultiplier * numRecordsInPerSecond.getSum()
> For example, let's say you've been marked as an inefficient scale-up, but the 
> LAG continues to build up.
> You need to scale up to eliminate the growing LAG, but because you're marked 
> as an inefficient scale-up, it won't happen.
> To unmark a scaleup as inefficient, the following conditions must be met: 
> actualIncrease/expectedIncrease > SCALING_EFFECTIVENESS_THRESHOLD (default 
> 0.1)
> Here, expectedIncrease is a constant with lastSummary, so the value of 
> actualIncrease must increase.
> However, the actualIncrease value is proportional to busyTimeMultiplier and 
> numRecordsInPerSecond, and these two values will converge to a certain value 
> if no scaling occurs.
> Therefore, the value of actualIncrease will also converge.
> If this value fails to cross a threshold, no further scaling up is possible, 
> even if the lag continues to build up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31976) Once marked as an inefficient scale-up, further scaling may not happen forever

Reply via email to