[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #581: [FLINK-31936] Support setting scale up max factor

via GitHub Thu, 04 May 2023 02:58:43 -0700


mxm commented on code in PR #581:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/581#discussion_r1184795632



##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -98,6 +98,13 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
                     .withDescription(
                             "Max scale down factor. 1 means no limit on scale 
down, 0.6 means job can only be scaled down with 60% of the original 
parallelism.");
 
+    public static final ConfigOption<Double> MAX_SCALE_UP_FACTOR =
+            autoScalerConfig("scale-up.max-factor")
+                    .doubleType()
+                    .defaultValue(2.0)
+                    .withDescription(
+                            "Max scale up factor. 2.0 means job can only be 
scaled up with 200% of the original parallelism.");

Review Comment:
   ```suggestion
                               "Max scale up factor. 2.0 means job can only be 
scaled up with 200% of the current parallelism.");
   ```
   
   Just to make it clear that it can double every time.



##########
flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/BacklogBasedScalingTest.java:
##########
@@ -100,6 +100,7 @@ public void setup() {
         defaultConf.set(AutoScalerOptions.CATCH_UP_DURATION, 
Duration.ofSeconds(2));
         defaultConf.set(AutoScalerOptions.SCALING_ENABLED, true);
         defaultConf.set(AutoScalerOptions.MAX_SCALE_DOWN_FACTOR, 1.);
+        defaultConf.set(AutoScalerOptions.MAX_SCALE_UP_FACTOR, 
Double.MAX_VALUE);

Review Comment:
   In my tests, MAX_VALUE might be problematic because:
   
   ```
   > Double.MAX_VALUE * 2
   Infinity
   ```



##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -98,6 +98,13 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
                     .withDescription(
                             "Max scale down factor. 1 means no limit on scale 
down, 0.6 means job can only be scaled down with 60% of the original 
parallelism.");
 
+    public static final ConfigOption<Double> MAX_SCALE_UP_FACTOR =
+            autoScalerConfig("scale-up.max-factor")
+                    .doubleType()
+                    .defaultValue(2.0)

Review Comment:
   I would suggest to set the default to the maximum parallelism, to be able to 
scale up from 1 to the max configured parallelism.



##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -98,6 +98,13 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
                     .withDescription(
                             "Max scale down factor. 1 means no limit on scale 
down, 0.6 means job can only be scaled down with 60% of the original 
parallelism.");
 
+    public static final ConfigOption<Double> MAX_SCALE_UP_FACTOR =
+            autoScalerConfig("scale-up.max-factor")
+                    .doubleType()
+                    .defaultValue(2.0)

Review Comment:
   In our production setup, we would definitely set this to a very high number, 
effectively disabling it. The reason is that we have pipelines which sit most 
idle (parallelism 1) but then receive huge traffic spikes which need quite high 
parallelisms (up to 500) to be able to keep up. The only way we limit the scale 
up is by setting a max parallelism.
   
   Being able to scale up quickly is very important to keep up with the backlog 
size / growth. If you cap the scale up, you limit the effectiveness of the 
scaling. I get where it comes from but generally I think it is curing the 
symptoms and not the cause of these spikes.
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-kubernetes-operator] mxm commented on a diff in pull request #581: [FLINK-31936] Support setting scale up max factor

Reply via email to