trystanj commented on code in PR #586:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1583828939


##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
     public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY =
             autoScalerConfig("target.utilization.boundary")
                     .doubleType()
-                    .defaultValue(0.1)
+                    .defaultValue(0.4)

Review Comment:
   @mxm your illustration was helpful. I'm having a hard time understanding why 
the autoscaler makes the decisions it makes. The logs, metrics, and definitions 
of terms are somewhat vague (e.g. I never knew what was meant by "true 
processing rate" until I saw you explain it here).
   
   I can open a new ticket to document and perhaps work on it myself, because I 
think these are critical definitions 😄 
   
   But is source backlog considered in this as well? I have a source which is 
lagging quite substantially, but never seems to scale out because the "true 
processing rate" suggests it should be keeping up, but it never does. The 
catch-up duration is having no effect, either.
   
   At the root of what I am asking is: even if the "true" rate is _calculated_ 
to be sufficient to catch up, _but it isn't actually catching up_ (let's say 
because a source is doing something inefficient with its kafka poll 
settings)... is there any mechanism in place to detect this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to