[GitHub] [flink-kubernetes-operator] mbalassi commented on a diff in pull request #558: [FLINK-31303] Expose Flink application resource usage via metrics and status

via GitHub Sun, 02 Apr 2023 02:28:30 -0700


mbalassi commented on code in PR #558:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/558#discussion_r1155274830



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -627,14 +637,42 @@ public Map<String, String> getClusterInfo(Configuration 
conf) throws Exception {
                                             .toSeconds(),
                                     TimeUnit.SECONDS);
 
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_VERSION,
                     dashboardConfiguration.getFlinkVersion());
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_REVISION,
                     dashboardConfiguration.getFlinkRevision());
         }
-        return runtimeVersion;
+
+        // JobManager resource usage can be deduced from the CR
+        var jmParameters =
+                new KubernetesJobManagerParameters(
+                        conf, new 
KubernetesClusterClientFactory().getClusterSpecification(conf));
+        var jmTotalCpu =
+                jmParameters.getJobManagerCPU()
+                        * jmParameters.getJobManagerCPULimitFactor()
+                        * jmParameters.getReplicas();
+        var jmTotalMemory =
+                Math.round(
+                        jmParameters.getJobManagerMemoryMB()
+                                * Math.pow(1024, 2)
+                                * jmParameters.getJobManagerMemoryLimitFactor()
+                                * jmParameters.getReplicas());
+
+        // TaskManager resource usage is best gathered from the REST API to 
get current replicas

Review Comment:
   Thanks @mateczagany, this approach looks good. If you have the bandwidth 
would you mind pushing your suggestions to this PR branch so that the commit 
can be attributed to you? 😏 I have invited you as a collaborator to my fork, 
you might need to accept that.
   
   I would ask the following if you have the time:
   
   1. Get resource configuration from the config as you suggested uniformly for 
JMs and TMs
   2. Get JM replicas from config, TM replicas from the REST API (we are trying 
to be careful with the TM replicas because we foresee that we might be changing 
things dynamically there via the autoscaler soon)
   3. Add a test to `FlinkDeploymentMetricsTest` that verifies that given that 
the `status.clusterInfo` is properly filled out we fill out the metrics 
properly.
   
   Currently we do not have meaningful test for creating the clusterInfo and 
since we are relying on the application's REST API I do not see an easy way of 
testing it properly, so I would accept this change without that (but it might 
merit a separate JIRA).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-kubernetes-operator] mbalassi commented on a diff in pull request #558: [FLINK-31303] Expose Flink application resource usage via metrics and status

Reply via email to