Re: [PR] [FLINK-36409] Publish some autoscaler metrics during stabilisation period [flink-kubernetes-operator]

via GitHub Thu, 20 Feb 2025 02:56:14 -0800


mxm commented on PR #945:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/945#issuecomment-2671149679


   > I guess Max is under the assumption that the current logic does not 
collect metrics during the stabilization period. We do collect samples, and 
once the stabilization is over we even evaluate them. This PR does not change 
that logic, so not sure what should be controlled by a flag. The only thing the 
PR does is that it reports those metrics. Can you clarify? I might missing 
something obvious from the current logic.
   
   You're right, we already return metrics from the stabilization phase, but 
only to measure the observed true processing rate. In the original model, we 
only returned metrics once the metric window was full. I think that was more 
elegant, but the source metrics proved not reliable enough that we had to 
manually measure the processing capacity instead of always relying on the 
processing rate and busyness metrics of sources.
   
   I might be a bit pedantic here, but I want to see the actual metrics used 
for evaluation reported as autoscaler metrics. Reporting metrics during 
stabilization removes that clarity. You can only observe what the assumptions 
of the autoscaler were, if you observed what is actually used for evaluation. 
That's why I suggested to put reporting autoscaler metrics during the 
stabilization period behind a flag.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-36409] Publish some autoscaler metrics during stabilisation period [flink-kubernetes-operator]

Reply via email to