[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #586: [FLINK-32002] Adjust autoscaler defaults for release

via GitHub Mon, 08 May 2023 03:09:17 -0700


mxm commented on code in PR #586:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1187270164



##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
     public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY =
             autoScalerConfig("target.utilization.boundary")
                     .doubleType()
-                    .defaultValue(0.1)
+                    .defaultValue(0.4)

Review Comment:
   Not at all. The boundary is used to calculate a scale down and a scale up 
rate. If the processing capacity falls below the scale up rate, we will scale 
up to reach the target capacity. If we exceeds the scale down rate, we will 
scale down to the target capacity. This is a bit counter-intuitive because the 
upper and lower bound are actually reversed.
   
   A `1.0` utilization for the upscale threshold will lower the scale up rate 
which means that we delaying upscale to utilize 100% of our processing capacity 
based on the calculated rates. However, we will still scale up if our 
processing capacity is lower than the scale up rate. The scale up rate is 
always computed via the target rate, but the comparison is made using the 
actual processing capacity.
   
   For example: Let's say we currently have a processing capacity of 100 
records/sec. The processing capacity if always estimated at 100% utilization 
(we call this also *true rate*). At a target rate of 50 records/second (e.g. 
Kafka ingestion rate), the scale up bound will be 50 rec/s. That means we will 
only scale up once our processing capacity falls below 50 rec/s. So we will 
delay scaling as much as possible. If the target rate was to increase to 110 
rec/s, we would scale up because our processing capacity of 100 rec/s is now 
lower.
   
   Similarly, the downscale rate will be raised (instead of lowered) when we 
increase the utilization boundary. That means we won't scale down as quickly. 
The tradeoff here is slightly higher resource usage but the scaling becomes 
less aggressive because we will only scale down once our processing capacity 
exceeds the know increased "lower bound".
   
   To illutrate this further here some sketches:
   
   Balanced:
   ```
   ------ upscale target rate
   ------ processing capacity (true rate)
   ------ downscale target rate
   ```
   
   We will scale up:
   ```
   ------ downscale rate
   ------ upscale rate
   ------ processing capacity (true rate)
   ```
   
   We will scale down:
   
   ```
   ------ processing capacity (true rate)
   ------ downscale rate
   ------ upscale rate
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #586: [FLINK-32002] Adjust autoscaler defaults for release

Reply via email to