trystanj commented on code in PR #586:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1583828939
##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder
autoScalerConfig(String key) {
public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY =
autoScalerConfig("target.utilization.boundary")
.doubleType()
- .defaultValue(0.1)
+ .defaultValue(0.4)
Review Comment:
@mxm your illustration was helpful. I'm having a hard time understanding why
the autoscaler makes the decisions it makes. The logs, metrics, and definitions
of terms are somewhat vague (e.g. I never knew what was meant by "true
processing rate" until I saw you explain it here).
I can open a new ticket to document and perhaps work on it myself, because I
think these are critical definitions 😄
But is source backlog considered in this as well? I have a source which is
lagging quite substantially, but never seems to scale out because the "true
processing rate" suggests it should be keeping up, but it never does. The
catch-up duration is having no effect, either.
At the root of what I am asking is: even if the "true rate" or busy
utilization is _calculated_ to be sufficient to catch up, _but it isn't
actually catching up_ (let's say because a source is doing something
inefficient with its kafka poll settings)... is there any mechanism in place to
detect this and trigger a scale out of that source vertex?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]