asafm commented on PR #16832:
URL: https://github.com/apache/pulsar/pull/16832#issuecomment-1216226129
Ok, I did some reading and this is what I found:
1. Usage calculation in `OperatingSystemMXBean` which delegates to
`OperatingSystemImp` gives you the percentage of usage of CPU relative to the
limit imposed on the container group, as can be seen here:
```java
long quota = containerMetrics.getCpuQuota();
long share = containerMetrics.getCpuShares();
if (quota > 0) {
long numPeriods = containerMetrics.getCpuNumPeriods();
long quotaNanos = TimeUnit.MICROSECONDS.toNanos(quota *
numPeriods);
return getUsageDividesTotal(cpuUsageSupplier().getAsLong(),
quotaNanos);
```
which calls
```java
private double getUsageDividesTotal(long usageTicks, long
totalTicks) {
// If cpu quota or cpu shares are in effect. Calculate the cpu
load
// based on the following formula (similar to how
// getCpuLoad0() is being calculated):
//
// | usageTicks - usageTicks' |
// ------------------------------
// | totalTicks - totalTicks' |
//
// where usageTicks' and totalTicks' are historical values
// retrieved via an earlier call of this method.
if (usageTicks < 0 || totalTicks <= 0) {
return -1;
}
long distance = usageTicks - this.usageTicks;
this.usageTicks = usageTicks;
long totalDistance = totalTicks - this.totalTicks;
this.totalTicks = totalTicks;
double systemLoad = 0.0;
if (distance > 0 && totalDistance > 0) {
systemLoad = ((double)distance) / totalDistance;
}
// Ensure the return value is in the range 0.0 -> 1.0
systemLoad = Math.max(0.0, systemLoad);
systemLoad = Math.min(1.0, systemLoad);
return systemLoad;
}
```
So we can obtain the percentage of CPU used relative to CPU allocated to the
container (container group).
As you can see the code pasted above also does delta calculation, relative
to last time it was called. It is the same as we do as can be see in Pulsar
code here:
```java
private double getTotalCpuUsageForCGroup(double elapsedTimeSeconds) {
double usage = getCpuUsageForCGroup();
double currentUsage = usage - lastCpuUsage;
lastCpuUsage = usage;
return 100 * currentUsage / elapsedTimeSeconds /
TimeUnit.SECONDS.toNanos(1);
}
```
The main point here is that JDK does provide relative usage or as you said
"mean" value.
2. The biggest difference I see is that when we report usage percent, we do
so relative to the entire host: we take the CPU Usage for CGroup as reported by
the operating system (measured in microseconds), only the delta from last time
measured, and divide that by elapsed, so in effect, it is CPU used relative to
the entire host.
```java
public static long getCpuUsageForCGroup() {
try {
if (metrics != null && getCpuUsageMethod != null) {
return (long) getCpuUsageMethod.invoke(metrics);
}
```
Which delegates to
```java
/**
* Returns the aggregate time, in nanoseconds, consumed by all
* tasks in the Isolation Group.
*
* @return Time in nanoseconds, -1 if unknown or
* -2 if the metric is not supported.
*
*/
public long getCpuUsage();
```
The limit we report is the percentage of CPU allocated, relative to the
entire host.
It actually looked a bit odd to me: Why do we report usage relative to host
and limit relative to host? Why not just report usage in percent relative to
the allocated container limit?
So I looked at how we are using it, as you also mentioned above. So I saw
those references:
```java
if (entry.getValue().limit > 0 && entry.getValue().limit >
entry.getValue().usage) {
Double temp = ((entry.getValue().limit -
entry.getValue().usage) / entry.getValue().limit) * 100;
percentAvailable = temp.intValue();
}
```
In the above code you can see it doesn't really need it the percent to be
relative to host nor does it need the limit. It just needs to say: CPU usage in
percent relative to container allocated CPU limit < 100%, which the JDK gives.
```java
double cpuChange = (newUsage.cpu.limit > 0)
? ((newUsage.cpu.usage - oldUsage.cpu.usage) *
100 / newUsage.cpu.limit)
: 0;
```
In the above code, you can see it only needs to know how much percent has
changed - same thing - it can use the JDK load percent.
The biggest issue I have is only with the reporting
```java
private void printResourceUsage(String name, ResourceUsage usage) {
spec.console().println(name + " : usage = " + usage.usage + ", limit
= " + usage.limit);
}
```
I don't really understand why would some need to know the limit. Why isn't
the CPU percentage enough relative to the container allocation?
I understand all I mentioned here is a bit of technical debt to fix in this
PR. I'm saying that we didn't mind not reporting the limit, we didn't need to
be very operating system specific, and write Linux-based reporting classes and
read the info of it from the file system and now through JVM internal classes.
We could just use `OperatingSystemMXBean`.
@heesung-sn What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]