asafm commented on PR #16832:
URL: https://github.com/apache/pulsar/pull/16832#issuecomment-1216226129

   Ok, I did some reading and this is what I found:
   
   1. Usage calculation in `OperatingSystemMXBean` which delegates to 
`OperatingSystemImp` gives you the percentage of usage of CPU relative to the 
limit imposed on the container group, as can be seen here:
   ```java
               long quota = containerMetrics.getCpuQuota();
               long share = containerMetrics.getCpuShares();
               if (quota > 0) {
                   long numPeriods = containerMetrics.getCpuNumPeriods();
                   long quotaNanos = TimeUnit.MICROSECONDS.toNanos(quota * 
numPeriods);
                   return getUsageDividesTotal(cpuUsageSupplier().getAsLong(), 
quotaNanos);
   ```
   
   which calls
   ```java
           private double getUsageDividesTotal(long usageTicks, long 
totalTicks) {
               // If cpu quota or cpu shares are in effect. Calculate the cpu 
load
               // based on the following formula (similar to how
               // getCpuLoad0() is being calculated):
               //
               //   | usageTicks - usageTicks' |
               //  ------------------------------
               //   | totalTicks - totalTicks' |
               //
               // where usageTicks' and totalTicks' are historical values
               // retrieved via an earlier call of this method.
               if (usageTicks < 0 || totalTicks <= 0) {
                   return -1;
               }
               long distance = usageTicks - this.usageTicks;
               this.usageTicks = usageTicks;
               long totalDistance = totalTicks - this.totalTicks;
               this.totalTicks = totalTicks;
               double systemLoad = 0.0;
               if (distance > 0 && totalDistance > 0) {
                   systemLoad = ((double)distance) / totalDistance;
               }
               // Ensure the return value is in the range 0.0 -> 1.0
               systemLoad = Math.max(0.0, systemLoad);
               systemLoad = Math.min(1.0, systemLoad);
               return systemLoad;
           }
   ```
   
   So we can obtain the percentage of CPU used relative to CPU allocated to the 
container (container group).
   As you can see the code pasted above also does delta calculation, relative 
to last time it was called. It is the same as we do as can be see in Pulsar 
code here:
   
   ```java
       private double getTotalCpuUsageForCGroup(double elapsedTimeSeconds) {
           double usage = getCpuUsageForCGroup();
           double currentUsage = usage - lastCpuUsage;
           lastCpuUsage = usage;
           return 100 * currentUsage / elapsedTimeSeconds / 
TimeUnit.SECONDS.toNanos(1);
       }
   ```
   
   The main point here is that JDK does provide relative usage or as you said 
"mean" value.
   
   2. The biggest difference I see is that when we report usage percent, we do 
so relative to the entire host: we take the CPU Usage for CGroup as reported by 
the operating system  (measured in microseconds), only the delta from last time 
measured, and divide that by elapsed, so in effect, it is CPU used relative to 
the entire host.
   
   ```java
       public static long getCpuUsageForCGroup() {
           try {
               if (metrics != null && getCpuUsageMethod != null) {
                   return (long) getCpuUsageMethod.invoke(metrics);
               }
   
   ```
   
   Which delegates to
   ```java
       /**
        * Returns the aggregate time, in nanoseconds, consumed by all
        * tasks in the Isolation Group.
        *
        * @return Time in nanoseconds, -1 if unknown or
        *         -2 if the metric is not supported.
        *
        */
       public long getCpuUsage();
   ```
   
   The limit we report is the percentage of CPU allocated, relative to the 
entire host. 
   
   It actually looked a bit odd to me: Why do we report usage relative to host 
and limit relative to host? Why not just report usage in percent relative to 
the allocated container limit?
   
   So I looked at how we are using it, as you also mentioned above. So I saw 
those references:
   
   ```java
               if (entry.getValue().limit > 0 && entry.getValue().limit > 
entry.getValue().usage) {
                   Double temp = ((entry.getValue().limit - 
entry.getValue().usage) / entry.getValue().limit) * 100;
                   percentAvailable = temp.intValue();
               }
   ```
   In the above code you can see it doesn't really need it the percent to be 
relative to host nor does it need the limit. It just needs to say: CPU usage in 
percent relative to container allocated CPU limit < 100%, which the JDK gives.
   
   ```java
              double cpuChange = (newUsage.cpu.limit > 0)
                               ? ((newUsage.cpu.usage - oldUsage.cpu.usage) * 
100 / newUsage.cpu.limit)
                               : 0;
   ```
   
   In the above code, you can see it only needs to know how much percent has 
changed - same thing - it can use the JDK load percent.
   
   The biggest issue I have is only with the reporting
   ```java
       private void printResourceUsage(String name, ResourceUsage usage) {
           spec.console().println(name + " : usage = " + usage.usage + ", limit 
= " + usage.limit);
       }
   ```
   
   I don't really understand why would some need to know the limit. Why isn't 
the CPU percentage enough relative to the container allocation?
   
   I understand all I mentioned here is a bit of technical debt to fix in this 
PR. I'm saying that we didn't mind not reporting the limit, we didn't need to 
be very operating system specific, and write Linux-based reporting classes and 
read the info of it from the file system and now through JVM internal classes. 
We could just use `OperatingSystemMXBean`.
   
   
   @heesung-sn What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to