Dear Monit-community,

after upgrading from Debian 5 to Debian 6, Monit fails to monitor my
'/var'-filesystem, which is located on a logical volume, it reports:

  'var' unable to read filesystem /dev/dm-12 state

* I am using Monit version 5.1.1, which comes with Debian 6 (sqeeze).
* /var is located on a logical volume at /dev/mapper/ister-var,
which is a symlink pointing to /dev/dm-12.

The relevant portion of my /etc/monit/monitrc reads as follows:

  check device var with path /dev/mapper/ister-var
          if space usage > 75% then alert
          if inode usage > 75% then alert


== Previous discussion on this mailing list ==

I have read
http://lists.nongnu.org/archive/html/monit-general/2010-05/msg00000.html
and none of the causes mentioned there seems to apply; to be precise:

* /var is mounted.
* We do not use SELinux.
* Our server is real, not virtual
(though it hosts virtual servers, using Xen).

monit -Iv does not seem to give any additional information compared to
the error message quoted above.


== Further Research ==

A similar monitoring rule is in effect for the root filesystem, which
does not reside on a logical, but a physical volume; this seems to work
out fine. The relevant portion of /etc/monit/monitrc reads:

  check device rootfs with path /dev/cciss/c0d0p2
          if space usage > 75% then alert
          if inode usage > 75% then alert


Additional testing with Python suggests that the statvfs() call succeeds
for /var, that is, the following script reports no errors:

  import os
  print os.statvfs('/var')

This prints:

  posix.statvfs_result(f_bsize=4096, f_frsize=4096,
  f_blocks=2064238, f_bfree=1698423, f_bavail=1593566, f_files=1048576,
  f_ffree=1039052, f_favail=1039052, f_flag=4102, f_namemax=255)


== Additional Details of our Setup ==

/etc/mtab - at the time of the error messages - reads:

  /dev/cciss/c0d0p2 / ext3 rw,errors=remount-ro 0 0
  tmpfs /lib/init/rw tmpfs rw,nosuid,mode=0755,size=32M 0 0
  proc /proc proc rw,noexec,nosuid,nodev 0 0
  sysfs /sys sysfs rw,noexec,nosuid,nodev 0 0
  udev /dev tmpfs rw,mode=0755 0 0
  tmpfs /dev/shm tmpfs rw,nosuid,nodev,size=32M 0 0
  devpts /dev/pts devpts rw,noexec,nosuid,gid=5,mode=620 0 0
  /dev/cciss/c0d0p1 /boot ext3 rw 0 0
  /dev/mapper/ister-home /home ext3 rw,nosuid,nodev,user_xattr 0 0
  /dev/mapper/ister-hp /hp ext3 rw,nodev 0 0
  /dev/mapper/ister-opt /opt ext3 rw,nodev 0 0
  /dev/mapper/ister-tmp /tmp ext3
    rw,nosuid,nodev,usrquota,grpquota,user_xattr 0 0
  /dev/mapper/ister-usr /usr ext3 rw,nodev 0 0
  /dev/mapper/ister-usr--local /usr/local ext3 rw,nodev 0 0
  /dev/mapper/ister-var /var ext3
    rw,nosuid,nodev,usrquota,grpquota,user_xattr 0 0
  xenfs /proc/xen xenfs rw 0 0

Please find our complete /etc/monitrc (monitrc) and the complete output
of monit -Iv (monit.log) attached.


Versions:

Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30)
([email protected]) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed
Jan 12 05:46:49 UTC 2011

LVM version:     2.02.66(2) (2010-05-20)
Library version: 1.02.48 (2010-05-20)
Driver version:  4.15.0


Thanks a lot for your help,

Odin Kroeger
Runtime constants:
 Control file       = /etc/monit/monitrc
 Log file           = syslog
 Pid file           = /var/run/monit.pid
 Debug              = True
 Log                = True
 Use syslog         = True
 Is Daemon          = True
 Use process engine = True
 Poll time          = 120 seconds with start delay 0 seconds
 Expect buffer      = 256 bytes
 Event queue        = base directory /var/monit with 100 slots
 Mail server(s)     = localhost:25 with timeout 5 seconds
 Mail from          = monit@<OURSERVER>
 Mail subject       = monit alert --  $EVENT $SERVICE
 Mail message       = $EVENT Service $SERV..(truncated)
 Start monit httpd  = False
 Alert mail to      = <LOGGING-EMAIL-ADDRESS>
   Alert on         = All events

The service list contains the following entries:

System Name           = <OURSERVER>
 Monitoring mode      = active
 Load avg. (15min)    = if greater than 450.0 1 times within 1 cycle(s) then 
exec '/usr/local/sbin/ister-monit-reboot' timeout 0 cycle(s) else if succeeded 
1 times within 1 cycle(s) then alert
 CPU wait limit       = if greater than 30.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 CPU system limit     = if greater than 30.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 CPU user limit       = if greater than 70.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 Memory usage limit   = if greater than 90.0% 5 times within 5 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 Load avg. (5min)     = if greater than 3.0 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 Load avg. (1min)     = if greater than 5.0 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert

Process Name          = sshd
 Pid file             = /var/run/sshd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/ssh start' timeout 30 second(s)
 Stop program         = '/etc/init.d/ssh stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Children             = If greater than 50 1 times within 1 cycle(s) then 
restart else if succeeded 1 times within 1 cycle(s) then alert
 Memory amount limit (incl. children) = If greater than 153600 5 times within 5 
cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Process Name          = exim4
 Pid file             = /var/run/exim4/exim.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/exim4 start' timeout 30 second(s)
 Stop program         = '/etc/init.d/exim4 stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Port                 = if failed localhost:25 [SMTP via TCP] with timeout 5 
seconds 5 times within 5 cycle(s) then restart else if succeeded 1 times within 
1 cycle(s) then alert
 Memory amount limit (incl. children) = If greater than 256000 3 times within 3 
cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Process Name          = syslogd
 Pid file             = /var/run/syslogd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/sysklogd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/sysklogd stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Process Name          = atd
 Pid file             = /var/run/atd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/atd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/atd stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Process Name          = cron
 Pid file             = /var/run/crond.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/cron start' timeout 30 second(s)
 Stop program         = '/etc/init.d/cron stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Process Name          = cpqarrayd
 Pid file             = /var/run/cpqarrayd.pid
 Monitoring mode      = active
 Start program        = '/etc/init.d/cpqarrayd start' timeout 30 second(s)
 Stop program         = '/etc/init.d/cpqarrayd stop' timeout 30 second(s)
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert
 Timeout              = If restarted 3 times within 5 cycle(s) then unmonitor

Filesystem Name       = rootfs
 Path                 = /dev/cciss/c0d0p2
 Monitoring mode      = active
 Filesystem flags     = if changed 1 times within 1 cycle(s) then alert
 Inodes usage limit   = if greater than 75.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 Space usage limit    = if greater than 75.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert

Filesystem Name       = var
 Path                 = /dev/mapper/ister-var
 Monitoring mode      = active
 Filesystem flags     = if changed 1 times within 1 cycle(s) then alert
 Inodes usage limit   = if greater than 75.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert
 Space usage limit    = if greater than 75.0% 1 times within 1 cycle(s) then 
alert else if succeeded 1 times within 1 cycle(s) then alert

File Name             = inittab
 Path                 = /etc/inittab
 Monitoring mode      = active
 Checksum             = if failed 7c2de0b7f96416a808547aa70f1b78b2(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = twcfg.txt
 Path                 = /etc/tripwire/twcfg.txt
 Monitoring mode      = active
 Checksum             = if failed 84ed8dd95a7d1fbcca56f0448ff262c1(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = tw.cfg
 Path                 = /etc/tripwire/tw.cfg
 Monitoring mode      = active
 Checksum             = if failed d5c57e11c71d55dc3d3da3e09a9c5c00(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = tw.pol
 Path                 = /etc/tripwire/tw.pol
 Monitoring mode      = active
 Checksum             = if failed 3a5d2f8fee033fc91cbdb72c1368073a(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = twcfg.txt.dpkg-dist
 Path                 = /etc/tripwire/twcfg.txt.dpkg-dist
 Monitoring mode      = active
 Checksum             = if failed 1821c7a0d207a168f1d7c766f238e816(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = twpol.txt
 Path                 = /etc/tripwire/twpol.txt
 Monitoring mode      = active
 Checksum             = if failed c7967470f1e743670ef5cc7ae6e54b7a(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = site.key
 Path                 = /etc/tripwire/site.key
 Monitoring mode      = active
 Checksum             = if failed a6868f80d7b4b088e1a5570d8350e9d8(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = <OURSERVER>-local.key
 Path                 = /etc/tripwire/<OURSERVER>-local.key
 Monitoring mode      = active
 Checksum             = if failed 2f9743e8d6c5c1b808c87e7be1f258a9(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = monitrc
 Path                 = /etc/monit/monitrc
 Monitoring mode      = active
 Checksum             = if failed b22c109670d82b38209117349bd015f3(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = reboot.sh
 Path                 = /etc/monit/reboot.sh
 Monitoring mode      = active
 Checksum             = if failed f0fc128f0cb4644b4b1ad0aba3852e3e(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = monitrc.dpkg-dist
 Path                 = /etc/monit/monitrc.dpkg-dist
 Monitoring mode      = active
 Checksum             = if failed d36ef8699889d514312e912d9bb7f875(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = tripwire
 Path                 = /usr/sbin/tripwire
 Monitoring mode      = active
 Checksum             = if failed acf6f195c6fa8767532f8546247f7456(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = twadmin
 Path                 = /usr/sbin/twadmin
 Monitoring mode      = active
 Checksum             = if failed d93f2df1c8babe07b299c95b85774855(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = twprint
 Path                 = /usr/sbin/twprint
 Monitoring mode      = active
 Checksum             = if failed 6f0f1161eb7c7bfb340c9088762fd54b(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = monit
 Path                 = /usr/sbin/monit
 Monitoring mode      = active
 Checksum             = if failed a28ea4c214026bbcfba7e2a1adba1e83(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libc.so.6
 Path                 = /lib/libc.so.6
 Monitoring mode      = active
 Checksum             = if failed 4519e46ef73991d57454b42fb494fa1d(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libssl.so.0.9.8
 Path                 = /usr/lib/libssl.so.0.9.8
 Monitoring mode      = active
 Checksmonit: pidfile '/var/run/monit.pid' does not exist
Starting monit daemon
'<OURSERVER>' Monit started
Monit instance changed notification is sent to <LOGGING-EMAIL-ADDRESS>
Processing postponed events queue
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=-1.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=-1.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=-1.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31808kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
Data access error notification is sent to <LOGGING-EMAIL-ADDRESS>
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.1%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.3%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.4%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31888kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31904kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.2%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=12]
'sshd' total mem amount check succeeded [current total mem amount=36880kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
'<OURSERVER>' loadavg(15min) check succeeded [current loadavg(15min)=0.0]
'<OURSERVER>' cpu wait usage check succeeded [current cpu wait usage=0.0%]
'<OURSERVER>' cpu system usage check succeeded [current cpu system usage=0.0%]
'<OURSERVER>' cpu user usage check succeeded [current cpu user usage=0.0%]
'<OURSERVER>' mem usage check succeeded [current mem usage=18.0%]
'<OURSERVER>' loadavg(5min) check succeeded [current loadavg(5min)=0.0]
'<OURSERVER>' loadavg(1min) check succeeded [current loadavg(1min)=0.0]
'sshd' zombie check succeeded [status_flag=0000]
'sshd' children check succeeded [current children=11]
'sshd' total mem amount check succeeded [current total mem amount=31932kB]
'exim4' zombie check succeeded [status_flag=0000]
'exim4' total mem amount check succeeded [current total mem amount=1056kB]
'exim4' succeeded connecting to INET[localhost:25] via TCP
'exim4' succeeded testing protocol [SMTP] at INET[localhost:25] via TCP
'syslogd' zombie check succeeded [status_flag=0000]
'atd' zombie check succeeded [status_flag=0000]
'cron' zombie check succeeded [status_flag=0000]
'cpqarrayd' zombie check succeeded [status_flag=0000]
'rootfs' inode usage check succeeded [current inode usage=3.6%]
'rootfs' space usage check succeeded [current space usage=17.7%]
'var' unable to read filesystem /dev/dm-12 state
'twcfg.txt' file existence check succeeded
'twcfg.txt' is a regular file
'twcfg.txt' has valid checksums
'tw.cfg' file existence check succeeded
'tw.cfg' is a regular file
'tw.cfg' has valid checksums
'tw.pol' file existence check succeeded
'tw.pol' is a regular file
'tw.pol' has valid checksums
'twcfg.txt.dpkg-dist' file existence check succeeded
'twcfg.txt.dpkg-dist' is a regular file
'twcfg.txt.dpkg-dist' has valid checksums
'twpol.txt' file existence check succeeded
'twpol.txt' is a regular file
'twpol.txt' has valid checksums
'site.key' file existence check succeeded
'site.key' is a regular file
'site.key' has valid checksums
'<OURSERVER>-local.key' file existence check succeeded
'<OURSERVER>-local.key' is a regular file
'<OURSERVER>-local.key' has valid checksums
'reboot.sh' file doesn't exist
'reboot.sh' trying to restart
Monitoring disabled -- service reboot.sh
Monitoring enabled -- service reboot.sh
'libpam.so.0' file existence check succeeded
'libpam.so.0' is a regular file
'libpam.so.0' has valid checksums
monit daemon with pid [21669] killed
'<OURSERVER>' Monit stopped
Monit instance changed notification is sent to <LOGGING-EMAIL-ADDRESS>
um             = if failed 304a4b22a952170ba2b7f8e1b698fe27(MD5) 1 times within 
1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = ld-linux-x86-64.so.2
 Path                 = /lib64/ld-linux-x86-64.so.2
 Monitoring mode      = active
 Checksum             = if failed a8c5bb432ee34b71eeb8dd6a8a8e0564(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libcrypto.so.0.9.8
 Path                 = /usr/lib/libcrypto.so.0.9.8
 Monitoring mode      = active
 Checksum             = if failed c14235b28c8c7440a393fb4a28fec790(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libpthread.so.0
 Path                 = /lib/libpthread.so.0
 Monitoring mode      = active
 Checksum             = if failed d578c7228e9905d8a29c581f471b74b4(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libdl.so.2
 Path                 = /lib/libdl.so.2
 Monitoring mode      = active
 Checksum             = if failed da61f40fe74337752c52ebd96d8d9086(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libz.so.1
 Path                 = /usr/lib/libz.so.1
 Monitoring mode      = active
 Checksum             = if failed 51cb8af10bde5d4deeb132f88f65824b(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libcrypt.so.1
 Path                 = /lib/libcrypt.so.1
 Monitoring mode      = active
 Checksum             = if failed 92fe1ebaa19eee18dd58756de8c65cfb(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libpam.so.0
 Path                 = /lib/libpam.so.0
 Monitoring mode      = active
 Checksum             = if failed 63efd0cfdf9d5094ba9dbb7d1715aba4(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libnsl.so.1
 Path                 = /lib/libnsl.so.1
 Monitoring mode      = active
 Checksum             = if failed 7737a30044b12b40359916e8a20f3f98(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

File Name             = libresolv.so.2
 Path                 = /lib/libresolv.so.2
 Monitoring mode      = active
 Checksum             = if failed 54be7e80b7a86930c80ea9aaa4975a30(MD5) 1 times 
within 1 cycle(s) then unmonitor else if succeeded 1 times within 1 cycle(s) 
then alert
 Alert mail to        = <LOGGING-EMAIL-ADDRESS>
   Alert on           = Checksum 

-------------------------------------------------------------------------------
set daemon 120 
set logfile syslog facility log_daemon 
set mailserver          localhost

set eventqueue
     basedir /var/monit
     slots 100

set mail-format { from: monit@<OURSERVER> }

set alert <LOGGING-EMAIL-ADDRESS> 

check system <OURSERVER> 
    if loadavg (1min) > 5 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 90% for 5 cycles then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 30% then alert
    if loadavg (15min) > 450 then exec /usr/local/sbin/ister-monit-reboot

        check process sshd with pidfile /var/run/sshd.pid
                start program = "/etc/init.d/ssh start"
                stop program = "/etc/init.d/ssh stop"
                if totalmem > 150.0 MB for 5 cycles then restart
                if children > 50 then restart
                if 3 restarts within 5 cycles then timeout

        check process exim4 with pidfile /var/run/exim4/exim.pid 
                start program = "/etc/init.d/exim4 start"
                stop program = "/etc/init.d/exim4 stop"
                if totalmem > 250.0 MB for 3 cycles then restart
                if failed host localhost port 25 protocol smtp
                        within 5 cycles then restart
                if 3 restarts within 5 cycles then timeout
        
        check process syslogd with pidfile /var/run/syslogd.pid
                start program = "/etc/init.d/sysklogd start"
                stop program = "/etc/init.d/sysklogd stop"
                if 3 restarts within 5 cycles then timeout
        
        check process atd with pidfile /var/run/atd.pid
                start program = "/etc/init.d/atd start"
                stop program = "/etc/init.d/atd stop"
                if 3 restarts within 5 cycles then timeout
        
        check process cron with pidfile /var/run/crond.pid
                start program = "/etc/init.d/cron start"
                stop program = "/etc/init.d/cron stop"
                if 3 restarts within 5 cycles then timeout

        check process cpqarrayd with pidfile /var/run/cpqarrayd.pid
                start program = "/etc/init.d/cpqarrayd start"
                stop program = "/etc/init.d/cpqarrayd stop"
                if 3 restarts within 5 cycles then timeout

        check device rootfs with path /dev/cciss/c0d0p2
                if space usage > 75% then alert
                if inode usage > 75% then alert
        
        check device var with path /dev/mapper/ister-var
                if space usage > 75% then alert
                if inode usage > 75% then alert

include /etc/monit/conf.d/*
--
To unsubscribe:
http://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to