swatiksi273-ksolves commented on code in PR #1139:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/1139#discussion_r3438211925
##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/metrics/ScalingMetrics.java:
##########
@@ -83,6 +83,15 @@ public static void computeDataRateMetrics(
var isSource = topology.isSource(jobVertexID);
var ioMetrics = topology.get(jobVertexID).getIoMetrics();
+ if (!ioMetrics.isMetricsComplete()) {
+ LOG.warn(
+ "Incomplete IO metrics for vertex {}, skipping scaling
decision to avoid incorrect scale down.",
+ jobVertexID);
+ scalingMetrics.put(ScalingMetric.NUM_RECORDS_IN, Double.NaN);
+ scalingMetrics.put(ScalingMetric.NUM_RECORDS_OUT, Double.NaN);
Review Comment:
Hi @Dennis-Mircea , thanks for the feedback!
I've updated the PR based on your suggestion. The fix is now in
ScalingMetricCollector.getJobTopology() instead of ScalingMetrics.java, no NaN
anywhere. When any vertex has read-records-complete: false or
write-records-complete: false, we now throw NotReadyException directly, which
causes the autoscaler to skip the entire collection cycle and retry next
interval.
Changes in this update:
1. Reverted all changes to IOMetrics.java and ScalingMetrics.java
2. Fixed ScalingMetricCollector.java, checks complete flags before building
the metrics map
3. Added testIncompleteIoMetricsThrowsNotReadyException test using the exact
REST API payload reported by Trystan
Only 2 files changed in production code: ScalingMetricCollector.java (+12
lines) and ScalingMetricCollectorTest.java (+1 test).
Regarding cluster testing: I tested on minikube with the fix deployed. The
complete: false window is very short in minikube since all pods run on the same
node, but the root cause has been confirmed by the reporter (Trystan) on a real
cluster, killing the JM and restarting resolved the issue with metrics
returning complete: true after restart.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]