mbalassi commented on code in PR #558:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/558#discussion_r1154290969
##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -627,14 +637,42 @@ public Map<String, String> getClusterInfo(Configuration
conf) throws Exception {
.toSeconds(),
TimeUnit.SECONDS);
- runtimeVersion.put(
+ clusterInfo.put(
DashboardConfiguration.FIELD_NAME_FLINK_VERSION,
dashboardConfiguration.getFlinkVersion());
- runtimeVersion.put(
+ clusterInfo.put(
DashboardConfiguration.FIELD_NAME_FLINK_REVISION,
dashboardConfiguration.getFlinkRevision());
}
- return runtimeVersion;
+
+ // JobManager resource usage can be deduced from the CR
+ var jmParameters =
+ new KubernetesJobManagerParameters(
+ conf, new
KubernetesClusterClientFactory().getClusterSpecification(conf));
+ var jmTotalCpu =
+ jmParameters.getJobManagerCPU()
+ * jmParameters.getJobManagerCPULimitFactor()
+ * jmParameters.getReplicas();
+ var jmTotalMemory =
+ Math.round(
+ jmParameters.getJobManagerMemoryMB()
+ * Math.pow(1024, 2)
+ * jmParameters.getJobManagerMemoryLimitFactor()
+ * jmParameters.getReplicas());
+
+ // TaskManager resource usage is best gathered from the REST API to
get current replicas
Review Comment:
There is a limit factor for TaskManager cores that Flink allows to be
configured on top of the resources defined on the Kubernestes level, similarly
to have I calculated the JobManager resources. I setup an example to validate
your suggestion where I have one JM and TM each, with 0.5 cpus configured in
the resources field each. The cpu limit factors are 1.0. We end up with 1.5
cpus (0.5 for the JM accurately reported and 1.0 for the TM).
```
jobManager:
replicas: 1
resource:
cpu: 0.5
memory: 2048m
serviceAccount: flink
taskManager:
resource:
cpu: 0.5
memory: 2048m
status:
clusterInfo:
flink-revision: DeadD0d0 @ 1970-01-01T01:00:00+01:00
flink-version: 1.16.1
tm-cpu-limit-factor: "1.0"
jm-cpu-limit-factor: "1.0"
total-cpu: "1.5"
total-memory: "4294967296"
jobManagerDeploymentStatus: READY
```
It is a bit of a tough problem, because the Flink UI also shows 1 core for
the TM (using the same value that we get from the REST API).
<img width="1403" alt="Screenshot 2023-03-31 at 12 08 26"
src="https://user-images.githubusercontent.com/5990983/229091963-f5e9a985-2ebe-4518-9623-6a4d4da9ad3c.png">
So ultimately we have to decide whether to stick with Flink or with
Kubernetes, I am leaning towards the latter (with calculating in the limit
factor, but avoiding the rounding).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]