Re: total cpu process bug?

Tom Pepper Thu, 05 Jan 2012 08:42:17 -0800

Yeah, any process should be calculated in the same way regardless of whether it 
is single-threaded or multi-threaded:


In a fixed timeslice: Total cpu time for process n / Total available cpu time 
systemwide = % of the cpu time consumed by process n

e.g. if a single threaded app in a four core box used one of four cores, that 
would be perhaps 1.0Ghz used / 4.0Ghz total available = 0.25 = 25% total CPU

What monit is currently reporting instead is the # of times the given process 
could run concurrently in the same amount of CPU, e.g. 4.0Ghz / 1.0 Ghz = 4 = 
400% or 4 times total.

What isn't obvious to me is why small processes still reflect ~0% total usage, 
when the numbers should be very high.  A process like nginx using very little 
CPU should with your current math report as 4.0Ghz / 0.020Ghz = 200 = 20,000%

Perhaps I'll dig through the source later this week and see what I can uncover.

Thanks,
-t
 
On Jan 5, 2012, at 8:21 AM, Lawrence, Wayne wrote:

> I can see what you mean it would probably be easier to just simply divide the 
> cpu usage by the number of cores. so 548 / 24 = 22.8% total cpu which would 
> seem correct.
>  
> I did a bit of digging and the explanation of the calculation is here 
> https://savannah.nongnu.org/bugs/index.php?34021
>  
> I am sure Martin will explain it further if required.
>  
> Regards
>  
> Wayne
> 
> On 5 January 2012 16:09, Tom Pepper <[email protected]> wrote:
> That would be incorrect though, no?  548% of theoretical 2400% usage is 
> 548/2400 = 0.2408 = 24%, not 4% of total CPU, obviously.
> 
> what you guys are computing instead, 2400% / 548% ~= 4 is the inverted 
> fraction version of the actual number.  numerator and denominator need to be 
> swapped.
> 
> -t
> 
> On Jan 5, 2012, at 4:33 AM, Martin Pala wrote:
> 
>> Yes, Wayne is correct and the usage is computed exactly as he described. 
>> Monit takes the summary of all CPU cores as 100%.
>> 
>> Regards,
>> Martin
>> 
>> 
>> 
>> On Jan 5, 2012, at 10:54 AM, Lawrence, Wayne wrote:
>> 
>>> May be wrong and i am sure someone will correct me if i am but it appears 
>>> the way the cpu usage is worked out against the multiple cores is why you 
>>> are getting this output.
>>>  
>>> The way i worked it out is the way i believe monit works it out and the 
>>> maths sort of make sense.
>>>  
>>> 24 cores  24 x 100% = 2400
>>>  
>>> so if you divide 2400 by your usage from top
>>>  
>>> 2400 / 578 = 4.2
>>>  
>>> which would give you your percentage shown in monit.
>>>  
>>> Regards
>>>  
>>> Wayne
>>>  
>>> 
>>> 
>>>  
>>> On 5 January 2012 08:13, Tom Pepper <[email protected]> wrote:
>>> Hello:
>>> 
>>> I have a number of high-CPU processes that run on 24-core boxes configured 
>>> e.g.:
>>> 
>>> check process emr-enc01-01 with pidfile 
>>> /var/run/tada_liveenc_emr-enc01-01.pid
>>>   start program = "/usr/local/tada/launch.sh -c emr-enc01-01"
>>>   stop program = "/bin/bash -c 'kill -s SIGTERM `/bin/cat 
>>> /var/run/tada_liveenc_emr-enc01-01.pid`'"
>>>   if totalmem > 80% then alert
>>>   if totalmem > 90% then restart
>>>   if totalcpu < 10% for 10 cycles then alert
>>> 
>>> These processes create pidfiles which match correctly in top as:
>>> 
>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND        
>>>                                                     
>>>  1710 root      20   0 3064m 1.2g 7808 S  578 15.8  47:31.53 tada_liveenc   
>>>                                                     
>>>  1866 root      20   0 2954m 1.3g 7804 S  545 16.7  45:18.52 tada_liveenc   
>>>   
>>> 
>>> However, monit sees these as a completely different total CPU usage:
>>> 
>>> Process 'emr-enc01-01'
>>>   status                            Running
>>>   monitoring status                 Monitored
>>>   pid                               1710
>>>   parent pid                        1
>>>   uptime                            8m 
>>>   children                          0
>>>   memory kilobytes                  1372300
>>>   memory kilobytes total            1372300
>>>   memory percent                    16.7%
>>>   memory percent total              16.7%
>>>   cpu percent                       4.1%
>>>   cpu percent total                 4.1%
>>>   data collected                    Thu, 05 Jan 2012 00:05:49
>>> 
>>> Process 'emr-enc01-02'
>>>   status                            Running
>>>   monitoring status                 Monitored
>>>   pid                               1866
>>>   parent pid                        1
>>>   uptime                            8m 
>>>   children                          0
>>>   memory kilobytes                  1362240
>>>   memory kilobytes total            1362240
>>>   memory percent                    16.6%
>>>   memory percent total              16.6%
>>>   cpu percent                       4.1%
>>>   cpu percent total                 4.1%
>>>   data collected                    Thu, 05 Jan 2012 00:05:49
>>> 
>>> Any thoughts on why this might be happening?  Hosts are ubuntu natty.  The 
>>> master processes themselves spawn about 150 threads (not forks).
>>> 
>>> FYI:
>>> 
>>> 662 root@enc01[tada]: $ uname -m
>>> x86_64
>>> 
>>> 663 root@enc01[tada]: $ file `which monit`
>>> /usr/local/bin/monit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
>>> dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
>>> 
>>> 664 root@enc01[tada]: $ monit -V
>>> This is Monit version 5.3.2
>>> Copyright (C) 2000-2011 Tildeslash Ltd. All Rights Reserved.
>>> 
>>> Thanks in advance,
>>> -Tom
>>> 
>>> --
>>> To unsubscribe:
>>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>> 
>>> --
>>> To unsubscribe:
>>> https://lists.nongnu.org/mailman/listinfo/monit-general
>> 
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
> 
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Re: total cpu process bug?

Reply via email to