Dear Monit-community,
after upgrading from Debian 5 to Debian 6, Monit fails to monitor my
'/var'-filesystem, which is located on a logical volume, it reports:
'var' unable to read filesystem /dev/dm-12 state
* I am using Monit version 5.1.1, which comes with Debian 6 (sqeeze).
* /var is located on a logical volume at /dev/mapper/ister-var,
which is a symlink pointing to /dev/dm-12.
The relevant portion of my /etc/monit/monitrc reads as follows:
check device var with path /dev/mapper/ister-var
if space usage > 75% then alert
if inode usage > 75% then alert
== Previous discussion on this mailing list ==
I have read
http://lists.nongnu.org/archive/html/monit-general/2010-05/msg00000.html
and none of the causes mentioned there seems to apply; to be precise:
* /var is mounted.
* We do not use SELinux.
* Our server is real, not virtual
(though it hosts virtual servers, using Xen).
monit -Iv does not seem to give any additional information compared to
the error message quoted above.
== Further Research ==
A similar monitoring rule is in effect for the root filesystem, which
does not reside on a logical, but a physical volume; this seems to work
out fine. The relevant portion of /etc/monit/monitrc reads:
check device rootfs with path /dev/cciss/c0d0p2
if space usage > 75% then alert
if inode usage > 75% then alert
Additional testing with Python suggests that the statvfs() call succeeds
for /var, that is, the following script reports no errors:
import os
print os.statvfs('/var')
This prints:
posix.statvfs_result(f_bsize=4096, f_frsize=4096,
f_blocks=2064238, f_bfree=1698423, f_bavail=1593566, f_files=1048576,
f_ffree=1039052, f_favail=1039052, f_flag=4102, f_namemax=255)
== Additional Details of our Setup ==
/etc/mtab - at the time of the error messages - reads:
/dev/cciss/c0d0p2 / ext3 rw,errors=remount-ro 0 0
tmpfs /lib/init/rw tmpfs rw,nosuid,mode=0755,size=32M 0 0
proc /proc proc rw,noexec,nosuid,nodev 0 0
sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0
udev /dev tmpfs rw,mode=0755 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,size=32M 0 0
devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=620 0 0
/dev/cciss/c0d0p1 /boot ext3 rw 0 0
/dev/mapper/ister-home /home ext3 rw,nosuid,nodev,user_xattr 0 0
/dev/mapper/ister-hp /hp ext3 rw,nodev 0 0
/dev/mapper/ister-opt /opt ext3 rw,nodev 0 0
/dev/mapper/ister-tmp /tmp ext3
rw,nosuid,nodev,usrquota,grpquota,user_xattr 0 0
/dev/mapper/ister-usr /usr ext3 rw,nodev 0 0
/dev/mapper/ister-usr--local /usr/local ext3 rw,nodev 0 0
/dev/mapper/ister-var /var ext3
rw,nosuid,nodev,usrquota,grpquota,user_xattr 0 0
xenfs /proc/xen xenfs rw 0 0
Please find our complete /etc/monitrc (monitrc) and the complete output
of monit -Iv (monit.log) attached.
Versions:
Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30)
([email protected]) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed
Jan 12 05:46:49 UTC 2011
LVM version: 2.02.66(2) (2010-05-20)
Library version: 1.02.48 (2010-05-20)
Driver version: 4.15.0
Thanks a lot for your help,
Odin Kroeger
Runtime constants:
Control file = /etc/monit/monitrc
Log file = syslog
Pid file = /var/run/monit.pid
Debug = True
Log = True
Use syslog = True
Is Daemon = True
Use process engine = True
Poll time = 120 seconds with start delay 0 seconds
Expect buffer = 256 bytes
Event queue = base directory /var/monit with 100 slots
Mail server(s) = localhost:25 with timeout 5 seconds
Mail from = monit@<OURSERVER>
Mail subject = monit alert -- $EVENT $SERVICE
Mail message = $EVENT Service $SERV..(truncated)
Start monit httpd = False
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = All events
The service list contains the following entries:
System Name = <OURSERVER>
Monitoring mode = active
Load avg. (15min) = if greater than 450.0 1 times within 1 cycle(s) then
exec '/usr/local/sbin/ister-monit-reboot' timeout 0 cycle(s) else if succeeded
1 times within 1 cycle(s) then alert
CPU wait limit = if greater than 30.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
CPU system limit = if greater than 30.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
CPU user limit = if greater than 70.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Memory usage limit = if greater than 90.0% 5 times within 5 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Load avg. (5min) = if greater than 3.0 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Load avg. (1min) = if greater than 5.0 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Process Name = sshd
Pid file = /var/run/sshd.pid
Monitoring mode = active
Start program = '/etc/init.d/ssh start' timeout 30 second(s)
Stop program = '/etc/init.d/ssh stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Children = If greater than 50 1 times within 1 cycle(s) then
restart else if succeeded 1 times within 1 cycle(s) then alert
Memory amount limit (incl. children) = If greater than 153600 5 times within 5
cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Process Name = exim4
Pid file = /var/run/exim4/exim.pid
Monitoring mode = active
Start program = '/etc/init.d/exim4 start' timeout 30 second(s)
Stop program = '/etc/init.d/exim4 stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Port = if failed localhost:25 [SMTP via TCP] with timeout 5
seconds 5 times within 5 cycle(s) then restart else if succeeded 1 times within
1 cycle(s) then alert
Memory amount limit (incl. children) = If greater than 256000 3 times within 3
cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Process Name = syslogd
Pid file = /var/run/syslogd.pid
Monitoring mode = active
Start program = '/etc/init.d/sysklogd start' timeout 30 second(s)
Stop program = '/etc/init.d/sysklogd stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Process Name = atd
Pid file = /var/run/atd.pid
Monitoring mode = active
Start program = '/etc/init.d/atd start' timeout 30 second(s)
Stop program = '/etc/init.d/atd stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Process Name = cron
Pid file = /var/run/crond.pid
Monitoring mode = active
Start program = '/etc/init.d/cron start' timeout 30 second(s)
Stop program = '/etc/init.d/cron stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Process Name = cpqarrayd
Pid file = /var/run/cpqarrayd.pid
Monitoring mode = active
Start program = '/etc/init.d/cpqarrayd start' timeout 30 second(s)
Stop program = '/etc/init.d/cpqarrayd stop' timeout 30 second(s)
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert
Timeout = If restarted 3 times within 5 cycle(s) then unmonitor
Filesystem Name = rootfs
Path = /dev/cciss/c0d0p2
Monitoring mode = active
Filesystem flags = if changed 1 times within 1 cycle(s) then alert
Inodes usage limit = if greater than 75.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Space usage limit = if greater than 75.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Filesystem Name = var
Path = /dev/mapper/ister-var
Monitoring mode = active
Filesystem flags = if changed 1 times within 1 cycle(s) then alert
Inodes usage limit = if greater than 75.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
Space usage limit = if greater than 75.0% 1 times within 1 cycle(s) then
alert else if succeeded 1 times within 1 cycle(s) then alert
File Name = inittab
Path = /etc/inittab
Monitoring mode = active
Checksum = if failed 7c2de0b7f96416a808547aa70f1b78b2(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = twcfg.txt
Path = /etc/tripwire/twcfg.txt
Monitoring mode = active
Checksum = if failed 84ed8dd95a7d1fbcca56f0448ff262c1(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = tw.cfg
Path = /etc/tripwire/tw.cfg
Monitoring mode = active
Checksum = if failed d5c57e11c71d55dc3d3da3e09a9c5c00(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = tw.pol
Path = /etc/tripwire/tw.pol
Monitoring mode = active
Checksum = if failed 3a5d2f8fee033fc91cbdb72c1368073a(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = twcfg.txt.dpkg-dist
Path = /etc/tripwire/twcfg.txt.dpkg-dist
Monitoring mode = active
Checksum = if failed 1821c7a0d207a168f1d7c766f238e816(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = twpol.txt
Path = /etc/tripwire/twpol.txt
Monitoring mode = active
Checksum = if failed c7967470f1e743670ef5cc7ae6e54b7a(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = site.key
Path = /etc/tripwire/site.key
Monitoring mode = active
Checksum = if failed a6868f80d7b4b088e1a5570d8350e9d8(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = <OURSERVER>-local.key
Path = /etc/tripwire/<OURSERVER>-local.key
Monitoring mode = active
Checksum = if failed 2f9743e8d6c5c1b808c87e7be1f258a9(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = monitrc
Path = /etc/monit/monitrc
Monitoring mode = active
Checksum = if failed b22c109670d82b38209117349bd015f3(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = reboot.sh
Path = /etc/monit/reboot.sh
Monitoring mode = active
Checksum = if failed f0fc128f0cb4644b4b1ad0aba3852e3e(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = monitrc.dpkg-dist
Path = /etc/monit/monitrc.dpkg-dist
Monitoring mode = active
Checksum = if failed d36ef8699889d514312e912d9bb7f875(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = tripwire
Path = /usr/sbin/tripwire
Monitoring mode = active
Checksum = if failed acf6f195c6fa8767532f8546247f7456(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = twadmin
Path = /usr/sbin/twadmin
Monitoring mode = active
Checksum = if failed d93f2df1c8babe07b299c95b85774855(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = twprint
Path = /usr/sbin/twprint
Monitoring mode = active
Checksum = if failed 6f0f1161eb7c7bfb340c9088762fd54b(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = monit
Path = /usr/sbin/monit
Monitoring mode = active
Checksum = if failed a28ea4c214026bbcfba7e2a1adba1e83(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libc.so.6
Path = /lib/libc.so.6
Monitoring mode = active
Checksum = if failed 4519e46ef73991d57454b42fb494fa1d(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libssl.so.0.9.8
Path = /usr/lib/libssl.so.0.9.8
Monitoring mode = active
Checksmonit: pidfile '/var/run/monit.pid' does not exist
Starting monit daemon
'<OURSERVER>' Monit started
Monit instance changed notification is sent to <LOGGING-EMAIL-ADDRESS>
Processing postponed events queue
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=-1.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=-1.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=-1.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31808kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
Data access error notification is sent to <LOGGING-EMAIL-ADDRESS>
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.1%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.3%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.4%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31888kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31904kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.2%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=12]
'sshd' total mem amount check succeeded [current total mem amount=36880kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31932kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
monit daemon with pid [21669] killed
'<OURSERVER>' Monit stopped
Monit instance changed notification is sent to <LOGGING-EMAIL-ADDRESS>
um = if failed 304a4b22a952170ba2b7f8e1b698fe27(MD5) 1 times within
1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = ld-linux-x86-64.so.2
Path = /lib64/ld-linux-x86-64.so.2
Monitoring mode = active
Checksum = if failed a8c5bb432ee34b71eeb8dd6a8a8e0564(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libcrypto.so.0.9.8
Path = /usr/lib/libcrypto.so.0.9.8
Monitoring mode = active
Checksum = if failed c14235b28c8c7440a393fb4a28fec790(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libpthread.so.0
Path = /lib/libpthread.so.0
Monitoring mode = active
Checksum = if failed d578c7228e9905d8a29c581f471b74b4(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libdl.so.2
Path = /lib/libdl.so.2
Monitoring mode = active
Checksum = if failed da61f40fe74337752c52ebd96d8d9086(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libz.so.1
Path = /usr/lib/libz.so.1
Monitoring mode = active
Checksum = if failed 51cb8af10bde5d4deeb132f88f65824b(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libcrypt.so.1
Path = /lib/libcrypt.so.1
Monitoring mode = active
Checksum = if failed 92fe1ebaa19eee18dd58756de8c65cfb(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libpam.so.0
Path = /lib/libpam.so.0
Monitoring mode = active
Checksum = if failed 63efd0cfdf9d5094ba9dbb7d1715aba4(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libnsl.so.1
Path = /lib/libnsl.so.1
Monitoring mode = active
Checksum = if failed 7737a30044b12b40359916e8a20f3f98(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
File Name = libresolv.so.2
Path = /lib/libresolv.so.2
Monitoring mode = active
Checksum = if failed 54be7e80b7a86930c80ea9aaa4975a30(MD5) 1 times
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s)
then alert
Alert mail to = <LOGGING-EMAIL-ADDRESS>
Alert on = Checksum
-------------------------------------------------------------------------------
set daemon 120
set logfile syslog facility log_daemon
set mailserver localhost
set eventqueue
basedir /var/monit
slots 100
set mail-format { from: monit@<OURSERVER> }
set alert <LOGGING-EMAIL-ADDRESS>
check system <OURSERVER>
if loadavg (1min) > 5 then alert
if loadavg (5min) > 3 then alert
if memory usage > 90% for 5 cycles then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 30% then alert
if loadavg (15min) > 450 then exec /usr/local/sbin/ister-monit-reboot
check process sshd with pidfile /var/run/sshd.pid
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if totalmem > 150.0 MB for 5 cycles then restart
if children > 50 then restart
if 3 restarts within 5 cycles then timeout
check process exim4 with pidfile /var/run/exim4/exim.pid
start program = "/etc/init.d/exim4 start"
stop program = "/etc/init.d/exim4 stop"
if totalmem > 250.0 MB for 3 cycles then restart
if failed host localhost port 25 protocol smtp
within 5 cycles then restart
if 3 restarts within 5 cycles then timeout
check process syslogd with pidfile /var/run/syslogd.pid
start program = "/etc/init.d/sysklogd start"
stop program = "/etc/init.d/sysklogd stop"
if 3 restarts within 5 cycles then timeout
check process atd with pidfile /var/run/atd.pid
start program = "/etc/init.d/atd start"
stop program = "/etc/init.d/atd stop"
if 3 restarts within 5 cycles then timeout
check process cron with pidfile /var/run/crond.pid
start program = "/etc/init.d/cron start"
stop program = "/etc/init.d/cron stop"
if 3 restarts within 5 cycles then timeout
check process cpqarrayd with pidfile /var/run/cpqarrayd.pid
start program = "/etc/init.d/cpqarrayd start"
stop program = "/etc/init.d/cpqarrayd stop"
if 3 restarts within 5 cycles then timeout
check device rootfs with path /dev/cciss/c0d0p2
if space usage > 75% then alert
if inode usage > 75% then alert
check device var with path /dev/mapper/ister-var
if space usage > 75% then alert
if inode usage > 75% then alert
include /etc/monit/conf.d/*
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general