X-czh commented on code in PR #586:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1186953244
##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -87,28 +88,28 @@ private static ConfigOptions.OptionBuilder
autoScalerConfig(String key) {
public static final ConfigOption<Integer> VERTEX_MAX_PARALLELISM =
autoScalerConfig("vertex.max-parallelism")
.intType()
- .defaultValue(Integer.MAX_VALUE)
+ .defaultValue(200)
.withDescription(
"The maximum parallelism the autoscaler can use.
Note that this limit will be ignored if it is higher than the max parallelism
configured in the Flink config or directly on each operator.");
public static final ConfigOption<Double> MAX_SCALE_DOWN_FACTOR =
autoScalerConfig("scale-down.max-factor")
.doubleType()
- .defaultValue(0.6)
+ .defaultValue(1.0)
Review Comment:
Curious why we choose to loose it to 1.0. We found in that TPR tends to be
overestimated a lot and leading to overly aggressive downscaling when:
- the pipeline is underloaded and far from optimal.
- Avg CPU allocated per slot is < 1.
The reason is that linear scaling with busy time metrics assumes no resource
competition between tasks when we pushing up the loads, however, when avg CPU
allocated per slot is small, resource competition beween tasks will be more and
more severve as we pushing up the overall loads of the pipeline.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]