Re: [PR] [FLINK-30593][autoscaler] Determine restart time on the fly fo Autoscaler [flink-kubernetes-operator]

via GitHub Mon, 20 Nov 2023 03:34:16 -0800


afedulov commented on code in PR #711:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/711#discussion_r1399055639



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##########
@@ -159,19 +161,24 @@ private void runScalingLogic(Context ctx, 
AutoscalerFlinkMetrics autoscalerMetri
             throws Exception {
 
         var collectedMetrics = metricsCollector.updateMetrics(ctx, stateStore);
+        var jobTopology = collectedMetrics.getJobTopology();
 
         if (collectedMetrics.getMetricHistory().isEmpty()) {
             return;
         }
         LOG.debug("Collected metrics: {}", collectedMetrics);
 
-        var evaluatedMetrics = evaluator.evaluate(ctx.getConfiguration(), 
collectedMetrics);
+        var now = clock.instant();
+        // Scaling tracking data contains previous restart times that are 
taken into account
+        var scalingTracking = getTrimmedScalingTracking(stateStore, ctx, now);
+        var evaluatedMetrics =
+                evaluator.evaluate(ctx.getConfiguration(), collectedMetrics, 
scalingTracking);

Review Comment:
   I don't think this works with the current metrics scoping since it would 
lead to duplicating the restart time per vertex and we are striving to 
minimizing the size of the config map. Also, instead of just checking one 
tracking entry, should we instead iterate over all records of this metric 
across all vertices and take the maximum over that? Or just trust one 
observation? If we'll just use one observation, why would we need this data in 
every vertex if we know it is supposed to be the same for all vertices, why 
store it at the vertex level? 
   
   Ultimately if comes to the fact that we need two different views - one from 
the vertex perspective and one from the overall job and trying to put data for 
one into the other causes a lot of issues and does not seem justified, 
especially since we do not win anything in terms of the configmap size but make 
things worse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-30593][autoscaler] Determine restart time on the fly fo Autoscaler [flink-kubernetes-operator]

Reply via email to