Hi Martin just as a side note here i disabled the cpu ssystem test and
tried again and it seems that the issue is present with all the cpu
monitoring/
I used the restarting of httpd as i knew it would trigger and alert and
these were the results.
Date: Thu, 08 Dec 2011 10:27:59
Action: alert
Host: <hostname removed>
Description: cpu user usage of 100.0% matches resource limit [cpu
user usage>70.0%]
I ran vmstat 1 10 at the same time as you can see its the 4th line.
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id
wa st
0 0 0 739220 142536 973532 0 0 4 7 10 6 0 0 99
0 0
0 0 0 739088 142536 973532 0 0 0 0 114 160 0 1 99
0 0
3 0 0 739088 142536 973536 0 0 0 0 126 169 1 2 97
0 0
0 0 0 737336 142536 973544 0 0 0 168 721 796 35 14 50
1 0
1 0 0 736964 142536 973544 0 0 0 0 109 160 1 1 98
0 0
and just to make it a little simpler i ran sar 1 10 as well as it is more
human readable.
10:27:55 CPU %user %nice %system %iowait %steal
%idle
10:27:56 all 1.01 0.00 1.01 0.00 0.00
97.98
10:27:57 all 0.00 0.00 1.00 0.00 0.00
99.00
10:27:58 all 3.96 0.00 3.96 0.00 0.00
92.08
10:27:59 all 32.00 0.00 12.00 1.00 0.00
55.00
Something struck me as odd while testing this yesterdays results reporting
50% system usage from 15.84% actual means the reported usage is 3.2 times
the actual. todays reported user usage of 100% is 3.2 times the actual 32%.
so it seems just need to work out why it is multiplying the results.
Regards
Wayne
On 7 December 2011 11:43, Lawrence, Wayne <[email protected]>wrote:
> Hi Martin,
>
> I downloaded the source from the Monit website and compiled it on the
> server.
> I have started monit in verbose mode and this is the relevant information
> it outputs when the event occurs.
>
> cpu system usage of 50.0% matches resource limit [cpu system
> usage>30.0%]
>
> -------------------------------------------------------------------------------
> ../tools/bin/monit() [0x41a533]
> ../tools/bin/monit(LogError+0x9f) [0x41ad2f]
> ../tools/bin/monit(Event_post+0x328) [0x417ba8]
> ..t/tools/bin/monit() [0x428071]
> ../tools/bin/monit(check_system+0x2b) [0x4285bb]
> ../tools/bin/monit(validate+0x226) [0x42ad16]
> ../tools/bin/monit() [0x41422d]
> ../tools/bin/monit(main+0x511) [0x4149e1]
> /lib64/libc.so.6(__libc_start_main+0xfd) [0x3592c1ecdd]
> ../tools/bin/monit() [0x40b179]
>
> -------------------------------------------------------------------------------
> Unfortunately remote access is not an option but I will happily run a
> debug version to try and track down this problem as I really would like to
> use Monit for my current build.
>
> Regards
>
> Wayne
> On 7 December 2011 11:17, Martin Pala <[email protected]> wrote:
>
>> Thanks for data.
>>
>> The /proc/stat format is this:
>>
>> cpu <user> <nice> <system> <idle> <wait> <irq> <softirq>
>>
>> The values count the cpu cycles, so if we subtract the corresponding
>> values from your output, we get this:
>>
>> user nice system idle wait irq softirq |
>> total
>> 09:57:35 1 0 1 99 0 0 0
>> | 101
>> 09:57:36 1 0 0 98 0 0 0
>> | 99
>> 09:57:37 25 0 16 59 1 0 0
>> | 101
>> 09:57:38 1 0 2 98 0 0 0
>> | 101
>>
>> => at 09:57:37 the cpu usage was:
>>
>> user = 24.75%
>> system = 15.84%
>> wait = 0.99%
>>
>> This corresponds to the previous vmstat output. Monit counts the cpu
>> usage the same way as above and doesn't modify these values => your monit
>> really reports strange cpu usage (reported 50% vs. real ~ 16%).
>>
>> What's the origin of your monit binary? Did you compile it from original
>> source code or some 3rd party source code distibution? (such as RHEL or
>> Fedora repository). Or do you use the pre-compiled binaries from
>> www.mmonit.com? Or some 3rd party binary, patches or source code from
>> other site?
>>
>> Please can you try to run monit in verbose mode and provide full output?:
>>
>> 1.) stop monit
>> 2.) run monit in foreground with verbose mode enabled:
>> ./monit -vI
>> 3.) after the problem happens, stop monit with "^C" and send output
>>
>> I can also prepare debug version which will dump the cpu usage related
>> informations or if you can provide remote access to the system, i can
>> troubleshoot the problem remotely.
>>
>>
>> Regards,
>> Martin
>>
>>
>>
>> On Dec 7, 2011, at 11:07 AM, Lawrence, Wayne wrote:
>>
>> Hi Martin,
>>
>> this is the output of the commands you requested.
>>
>> 1.) uname -m
>>
>> x86_64
>>
>> 2.) file `which monit`
>>
>> ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically
>> linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
>> I ran the command you supplied to get the cup usage directly as well
>> while restarting the httpd service as i know this will generate an alert.
>>
>>
>>
>> Date: Wed, 07 Dec 2011 09:57:37
>> Action: exec
>> Host: <hostname removed>
>> Description: cpu system usage of 50.0% matches resource limit [cpu
>> system usage>30.0%]
>>
>> Wed Dec 7 09:57:34 GMT 2011
>> cpu 207060 501 103542 49452254 25303 83 1569 0 0
>> Wed Dec 7 09:57:35 GMT 2011
>> cpu 207061 501 103543 49452353 25303 83 1569 0 0
>> Wed Dec 7 09:57:36 GMT 2011
>> cpu 207062 501 103543 49452451 25303 83 1569 0 0
>> Wed Dec 7 09:57:37 GMT 2011
>> cpu 207087 501 103559 49452510 25304 83 1569 0 0
>> Wed Dec 7 09:57:38 GMT 2011
>> cpu 207088 501 103561 49452608 25304 83 1569 0 0
>> Wed Dec 7 09:57:40 GMT 2011
>> If my understanding of /proc/stat is coreect this still doesnt make any
>> sense but i may be wrong.
>>
>> Regards
>>
>> Wayne
>>
>>
>>
>> On 7 December 2011 09:37, Martin Pala <[email protected]> wrote:
>>
>>> Please can you check that your monit binary matches the system
>>> architecture? (i.e. for example 64-bit monit binary on 64-bit system - not
>>> 32-bit monit on 64-bit system)
>>>
>>> To verify provide please the output of following commands:
>>> 1.) uname -m
>>> 2.) file `which monit`
>>>
>>> Monit takes the statistics from the /proc/stat kernel interface. You can
>>> collect the statistics manually like this - for example to fetch the state
>>> in 1 second intervals (30 samples):
>>>
>>> $ for ((i=0; i<30; i++)); do date; grep "cpu " /proc/stat; sleep 1;
>>> done
>>>
>>> Note: monit takes the first /proc/stat line ("cpu") which contains the
>>> overall cpu usage in the system (summary of all cpus). The /proc/stat also
>>> contains per-cpu statistics if you want to collect all the statistics,
>>> replace the "grep 'cpu '" simply with "cat".
>>>
>>> Regards,
>>> Martin
>>>
>>>
>>> On Dec 7, 2011, at 10:04 AM, Lawrence, Wayne wrote:
>>>
>>> Hi Martin,
>>>
>>> I have tried various methods to dientify the cause of this and took your
>>> advice and used vmstat. I simply restarted the httpd process from the monit
>>> web interface while the comand was running and got the following warning.
>>>
>>> Description: cpu system usage of 50.0% matches resource limit
>>> [cpu system usage>30.0%]
>>>
>>> But vmstat doesnt show that level of usage at the point of alert. As you
>>> can see there is some usage in the 3rd line of the output when i restarted
>>> the httpd service but it doesnt seem enough to trigger an alert.
>>>
>>> vmstat 1 10
>>> procs -----------memory---------- ---swap-- -----io---- --system--
>>> -----cpu-----
>>> r b swpd free buff cache si so bi bo in cs us sy
>>> id wa st
>>> 0 0 0 859596 114684 856908 0 0 4 6 81 77 0 0
>>> 99 0 0
>>> 0 0 0 859448 114684 856916 0 0 0 0 100 94 1 0
>>> 99 0 0
>>> 0 0 0 898352 114692 815600 0 0 0 168 555 605 23 15
>>> 61 1 0
>>>
>>> Not sure if there are any other tests i can run to narrow this down a
>>> bit further as it still isn't making sense.
>>>
>>> Regards
>>>
>>> Wayne
>>>
>>>
>>>
>>>
>>>
>>> On 7 December 2011 08:27, Martin Pala <[email protected]> wrote:
>>>
>>>> Hi Lawrence,
>>>>
>>>> the test which triggers the alert is "system" cpu => it's the time the
>>>> system spend in kernel mode. The cpu usage could be triggered by some
>>>> background kernel task, to verify the monit report matches the system cpu
>>>> usage, you should use either "vmstat" or "top" instead of "ps".
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> On Dec 6, 2011, at 1:19 PM, Lawrence, Wayne wrote:
>>>>
>>>> Hi Igor,
>>>>
>>>> the operating system is RHEL6 and monit version is 5.3.1
>>>>
>>>> this is what i have in my config
>>>>
>>>> if cpu usage (user) > 70% then alert
>>>> if cpu usage (system) > 30% then alert
>>>> if cpu usage (wait) > 20% then alert
>>>>
>>>> this is one of the errors
>>>> Description: cpu system usage of 50.0% matches resource limit [cpu
>>>> system usage>30.0%]
>>>>
>>>> this is what i get in /var/log/messages
>>>> Dec 6 12:01:29 <hostname-removed> monit[864]: <hostname-removed> cpu
>>>> system usage of 50.0% matches resource limit [cpu system usage>30.0%]
>>>> Dec 6 12:02:29 <hostname-removed> monit[864]:
>>>> <hostname-removed><hostname-removed>' cpu system usage check succeeded
>>>> [current cpu system usage=0.9%]
>>>>
>>>> this is the output of ps --no-headers -A -o "%*cpu* sz ucomm" | sort
>>>> -k1nr | head -20
>>>>
>>>> 12:01:29 up 4 days, 20:24, 2 users, load average: 0.04, 0.01, 0.00
>>>> total used free shared buffers
>>>> cached
>>>> Mem: 2055108 1092176 962932 0 53156
>>>> 811864
>>>> -/+ buffers/cache: 227156 1827952
>>>> Swap: 4128760 0 4128760
>>>> 1.2 44308 perl
>>>> 0.0 0 aio/0
>>>> 0.0 0 async/mgr
>>>> 0.0 0 ata/0
>>>> 0.0 0 ata_aux
>>>> 0.0 0 bdi-default
>>>> 0.0 0 cpuset
>>>> 0.0 0 crypto/0
>>>> 0.0 0 events/0
>>>> 0.0 0 ext4-dio-unwrit
>>>> 0.0 0 flush-253:0
>>>> 0.0 0 jbd2/dm-0-8
>>>> 0.0 0 kacpi_hotplug
>>>> 0.0 0 kacpi_notify
>>>> 0.0 0 kacpid
>>>> 0.0 0 kauditd
>>>> 0.0 0 kblockd/0
>>>> 0.0 0 kdmflush
>>>> 0.0 0 khelper
>>>> 0.0 0 khubd
>>>>
>>>> Have to say i am at a total loss as there is no way the usage figures
>>>> are accurate.
>>>> If there is any other info i can supply that will be useful please let
>>>> me know.
>>>>
>>>> Regards
>>>>
>>>> Wayne
>>>>
>>>>
>>>> On 6 December 2011 12:03, Igor Homyakov <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Lawrence,
>>>>>
>>>>> Could you be a little bit more specific ? Please provide information
>>>>> about you operation system, monit version on which the problem
>>>>> occurred and so on.
>>>>>
>>>>> Regards
>>>>> Igor Homyakov
>>>>>
>>>>> On Tue, Dec 6, 2011 at 15:35, Lawrence, Wayne
>>>>> <[email protected]> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have a few CPU usage checks in my monitrc but it seems monit is
>>>>> > misreporting the usage.
>>>>> >
>>>>> > I have run several tests and it seems that monit is multiplying the
>>>>> actual
>>>>> > usage by 10.
>>>>> >
>>>>> > I ran a process with top running in another shell and CPU usage for
>>>>> the user
>>>>> > was never above 10% yet monit informed me that there was 100% cpu
>>>>> usage.
>>>>> >
>>>>> > I have tried various configurations including the one that came with
>>>>> the
>>>>> > default config for system cpu monitoring and all seem to demonstrate
>>>>> the
>>>>> > same issue.
>>>>> >
>>>>> > Any advice welcomed on this
>>>>> >
>>>>> > Regards
>>>>> >
>>>>> > Wayne Lawrence
>>>>>
>>>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general