Hi,
monit was set to monitor cpu usage - it triggers the alert when the
watermark is reached, but it doesn't analyze itself which process is
responsible for the load.
You can hook easily script which will be triggered by high CPU load
and which will collect the informations during the peek.
For example let's create script like this:
/tmp/monit_top.sh:
--8<--
#!/bin/sh
exec 1>/tmp/monit_top
exec 2>>/tmp/monit_top.out
echo $$ > /tmp/monit_top.pid
while true
do
uptime
free
ps --no-headers -A -o "%cpu sz ucomm" | sort -k1nr | head -20
echo "#############################"
sleep 5
done
--8<--
chmod 755 /tmp/monit_top.sh
and modify monit configuration like this:
--8<--
check system TamTam
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then exec "/tmp/monit_top.sh" else
if recovered then exec "/bin/bash -c 'kill `cat /tmp/monit_top.pid` &&
cat /tmp/monit_top.out | mail -s 'cpu usage alert' [email protected]'"
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
--8<--
Basically when the cpu usage goes high, the script which collects the
resource usage information and TOP20 processes each 5 seconds is
started. When the cpu usage lowered, the script is stopped and output
mailed to [email protected].
You can modify the script as you want - collect additional
informations, modify the sleep time, etc.
Martin
On May 18, 2009, at 10:06 AM, Pascal Legrand wrote:
Hello,
i've got a problem with monit, i configure it to alert me when cpu
usage is too important, and i've got this mail :
Resource limit matched Service Intranet
Date: Mon, 18 May 2009 04:13:24 +0200
Action: alert
Host: tamtam
Description: 'Intranet' cpu user usage of 70.4% matches resource
limit [cpu user usage>70.0%]
Resource limit matched Service Intranet
Date: Mon, 18 May 2009 04:13:25 +0200
Action: alert
Host: tamtam
Description: 'Intranet' loadavg(5min) of 2.2 matches resource limit
[loadavg(5min)>2.0]
But i cant see in logs what happen.
is it a bug of monit ?
here is some informations about my server
thank you for your help
Debian Lenny
------------------------------------------------------
Paquet : monit
État: installé
Automatiquement installé: non
Version : 1:4.10.1-4
------------------------------------------------------
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 697.898
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge
mca cmov pat pse36 mmx fxsr sse up
bogomips : 1397.51
clflush size : 32
power management:
------------------------------------------------------
Configuration de monit
set daemon 60
set logfile syslog facility log_daemon
set mailserver smtp.bla.fr
set mail-format { from: [email protected] }
set alert [email protected]
set httpd port 2812 and
allow admin:test
# System
check system TamTam
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
# Disk
check device root-hda1 with path /dev/hda1
if space usage > 85% then alert
check device home-hda9 with path /dev/hda9
if space usage > 85% then alert
check device tmp-hda8 with path /dev/hda8
if space usage > 85% then alert
check device usr-hda5 with path /dev/hda5
if space usage > 85% then alert
check device var-hda6 with path /dev/hda6
if space usage > 85% then alert
#Surveillance de ssh
check process sshd with pidfile /var/run/sshd.pid
start program "/etc/init.d/ssh start"
stop program "/etc/init.d/ssh stop"
if failed port 22 protocol ssh then alert
#dhcpd
check process dhcpd with pidfile /var/run/dhcpd.pid
start program "/etc/init.d/dhcp3-server start"
stop program "/etc/init.d/dhcp3-server stop"
if failed port 67 type udp then alert
#ldap
check process slapd with pidfile /var/run/slapd/slapd.pid
start program = "/etc/init.d/slapd start"
stop program = "/etc/init.d/slapd stop"
if failed port 389 protocol ldap3 then alert
#Smartd
check process smartd with pidfile /var/run/smartd.pid
start program = "/etc/init.d/smartmontools start"
stop program = "/etc/init.d/smartmontools stop"
if changed pid then alert
------------------------------------------------------
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general