trystanj commented on code in PR #586:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1583828939


##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
     public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY =
             autoScalerConfig("target.utilization.boundary")
                     .doubleType()
-                    .defaultValue(0.1)
+                    .defaultValue(0.4)

Review Comment:
   @mxm your illustration was helpful. I'm having a hard time understanding why 
the autoscaler makes the decisions it makes. The logs, metrics, and definitions 
of terms are somewhat vague (e.g. I never knew what was meant by "true 
processing rate" until I saw you explain it here).
   
   I can open a new ticket to document and perhaps work on it myself, because I 
think these are critical definitions 😄 
   
   But is source backlog considered in this as well? I have a source which is 
lagging quite substantially, but never seems to scale out because the "true 
processing rate" suggests it should be keeping up, but it never does. The 
catch-up duration is having no effect, either.
   
   At the root of what I am asking is: even if the "true rate" or busy 
utilization is _calculated_ to be sufficient to catch up, _but it isn't 
actually catching up_ (let's say because a source is doing something 
inefficient with its kafka poll settings)... is there any mechanism in place to 
detect this and trigger a scale out of that source vertex?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to