Hi, Martin:
Can you clarify what exactly these two lines do in process.c's cpu percentage
calculation?
if (pt[i].cpu_percent > 1000 / systeminfo.cpus)
pt[i].cpu_percent = 1000 / systeminfo.cpus;
They're causing total cpu to be misreported when processes use a large amount
of CPU and many cores are present. Shouldn't the "/ systeminfo.cpus" be
dropped in both cases? I assume it's meant to keep any strange math from
causing process cpu percentage to ever exceed 100%.
For example, with a 120s query delay, a process I have on a 24 core box
calculates with process.c's logic as:
cputime = 4809915 cputime_prev = 4803601 (delta 6314)
time = 13258814089.516930 time_prev = 13258812889.395201 (delta 1200)
cputime - cputime_prev / time - time_prev = 6314/1200 = 5.26
1000 * 5.26 / 24 cpus = 219 "pt[i].cpu_percent" (which appears to represent
21.9% in monitese), which is accurate.
1000 / num_cpus is 41.6 on my box. since 219 >> 41.6 it gets cut back to 41.6.
Thanks,
-t
On Jan 5, 2012, at 4:33 AM, Martin Pala wrote:
> Yes, Wayne is correct and the usage is computed exactly as he described.
> Monit takes the summary of all CPU cores as 100%.
>
> Regards,
> Martin
>
>
>
> On Jan 5, 2012, at 10:54 AM, Lawrence, Wayne wrote:
>
>> May be wrong and i am sure someone will correct me if i am but it appears
>> the way the cpu usage is worked out against the multiple cores is why you
>> are getting this output.
>>
>> The way i worked it out is the way i believe monit works it out and the
>> maths sort of make sense.
>>
>> 24 cores 24 x 100% = 2400
>>
>> so if you divide 2400 by your usage from top
>>
>> 2400 / 578 = 4.2
>>
>> which would give you your percentage shown in monit.
>>
>> Regards
>>
>> Wayne
>>
>>
>>
>>
>> On 5 January 2012 08:13, Tom Pepper <[email protected]> wrote:
>> Hello:
>>
>> I have a number of high-CPU processes that run on 24-core boxes configured
>> e.g.:
>>
>> check process emr-enc01-01 with pidfile
>> /var/run/tada_liveenc_emr-enc01-01.pid
>> start program = "/usr/local/tada/launch.sh -c emr-enc01-01"
>> stop program = "/bin/bash -c 'kill -s SIGTERM `/bin/cat
>> /var/run/tada_liveenc_emr-enc01-01.pid`'"
>> if totalmem > 80% then alert
>> if totalmem > 90% then restart
>> if totalcpu < 10% for 10 cycles then alert
>>
>> These processes create pidfiles which match correctly in top as:
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>
>> 1710 root 20 0 3064m 1.2g 7808 S 578 15.8 47:31.53 tada_liveenc
>>
>> 1866 root 20 0 2954m 1.3g 7804 S 545 16.7 45:18.52 tada_liveenc
>>
>>
>> However, monit sees these as a completely different total CPU usage:
>>
>> Process 'emr-enc01-01'
>> status Running
>> monitoring status Monitored
>> pid 1710
>> parent pid 1
>> uptime 8m
>> children 0
>> memory kilobytes 1372300
>> memory kilobytes total 1372300
>> memory percent 16.7%
>> memory percent total 16.7%
>> cpu percent 4.1%
>> cpu percent total 4.1%
>> data collected Thu, 05 Jan 2012 00:05:49
>>
>> Process 'emr-enc01-02'
>> status Running
>> monitoring status Monitored
>> pid 1866
>> parent pid 1
>> uptime 8m
>> children 0
>> memory kilobytes 1362240
>> memory kilobytes total 1362240
>> memory percent 16.6%
>> memory percent total 16.6%
>> cpu percent 4.1%
>> cpu percent total 4.1%
>> data collected Thu, 05 Jan 2012 00:05:49
>>
>> Any thoughts on why this might be happening? Hosts are ubuntu natty. The
>> master processes themselves spawn about 150 threads (not forks).
>>
>> FYI:
>>
>> 662 root@enc01[tada]: $ uname -m
>> x86_64
>>
>> 663 root@enc01[tada]: $ file `which monit`
>> /usr/local/bin/monit: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>> dynamically linked (uses shared libs), for GNU/Linux 2.6.0, not stripped
>>
>> 664 root@enc01[tada]: $ monit -V
>> This is Monit version 5.3.2
>> Copyright (C) 2000-2011 Tildeslash Ltd. All Rights Reserved.
>>
>> Thanks in advance,
>> -Tom
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general