trystanj commented on code in PR #586:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1583828939
##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder
autoScalerConfig(String key) {
public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY =
autoScalerConfig("target.utilization.boundary")
.doubleType()
- .defaultValue(0.1)
+ .defaultValue(0.4)
Review Comment:
@mxm your illustration was helpful. I'm having a hard time understanding why
the autoscaler makes the decisions it makes. The logs, metrics, and definitions
of terms are somewhat vague (e.g. I never knew what was meant by "true
processing rate" until I saw you explain it here).
I can open a new ticket to document and perhaps work on it myself, because I
think these are critical definitions 😄
But is source backlog considered in this as well? I have a source which is
lagging quite substantially, but never seems to scale out because the "true
processing rate" suggests it should be keeping up, but it never does. The
catch-up duration is having no effect, either.
At the root of what I am asking is: even if the "true" rate is _calculated_
to be sufficient to catch up, _but it isn't actually catching up_ (let's say
because a source is doing something inefficient with its kafka poll
settings)... is there any mechanism in place to detect this and trigger a scale
out?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]