[GitHub] [flink-kubernetes-operator] mbalassi commented on a diff in pull request #558: [FLINK-31303] Expose Flink application resource usage via metrics and status

via GitHub Fri, 31 Mar 2023 03:12:15 -0700


mbalassi commented on code in PR #558:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/558#discussion_r1154290969



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -627,14 +637,42 @@ public Map<String, String> getClusterInfo(Configuration 
conf) throws Exception {
                                             .toSeconds(),
                                     TimeUnit.SECONDS);
 
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_VERSION,
                     dashboardConfiguration.getFlinkVersion());
-            runtimeVersion.put(
+            clusterInfo.put(
                     DashboardConfiguration.FIELD_NAME_FLINK_REVISION,
                     dashboardConfiguration.getFlinkRevision());
         }
-        return runtimeVersion;
+
+        // JobManager resource usage can be deduced from the CR
+        var jmParameters =
+                new KubernetesJobManagerParameters(
+                        conf, new 
KubernetesClusterClientFactory().getClusterSpecification(conf));
+        var jmTotalCpu =
+                jmParameters.getJobManagerCPU()
+                        * jmParameters.getJobManagerCPULimitFactor()
+                        * jmParameters.getReplicas();
+        var jmTotalMemory =
+                Math.round(
+                        jmParameters.getJobManagerMemoryMB()
+                                * Math.pow(1024, 2)
+                                * jmParameters.getJobManagerMemoryLimitFactor()
+                                * jmParameters.getReplicas());
+
+        // TaskManager resource usage is best gathered from the REST API to 
get current replicas

Review Comment:
   There is a limit factor for TaskManager cores that Flink allows to be 
configured on top of the resources defined on the Kubernestes level, similarly 
to have I calculated the JobManager resources. I setup an example to validate 
your suggestion where I have one JM and TM each, with 0.5 cpus configured in 
the resources field each. The cpu limit factors are 1.0. We end up with 1.5 
cpus (0.5 for the JM accurately reported and 1.0 for the TM).
   
   ```
     jobManager:
       replicas: 1
       resource:
         cpu: 0.5
         memory: 2048m
     serviceAccount: flink
     taskManager:
       resource:
         cpu: 0.5
         memory: 2048m
   status:
     clusterInfo:
       flink-revision: DeadD0d0 @ 1970-01-01T01:00:00+01:00
       flink-version: 1.16.1
       tm-cpu-limit-factor: "1.0"
       jm-cpu-limit-factor: "1.0"
       total-cpu: "1.5"
       total-memory: "4294967296"
     jobManagerDeploymentStatus: READY
   ```
   
   It is a bit of a tough problem, because the Flink UI also shows 1 core for 
the TM (using the same value that we get from the REST API).
   
   <img width="1403" alt="Screenshot 2023-03-31 at 12 08 26" 
src="https://user-images.githubusercontent.com/5990983/229091963-f5e9a985-2ebe-4518-9623-6a4d4da9ad3c.png";>
   
   So ultimately we have to decide whether to stick with Flink or with 
Kubernetes, I am leaning towards the latter (with calculating in the limit 
factor, but avoiding the rounding).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-kubernetes-operator] mbalassi commented on a diff in pull request #558: [FLINK-31303] Expose Flink application resource usage via metrics and status

Reply via email to