Hugo, I didn't think it relevant to post full details of hosts/services as sometimes the commands work and sometimes they don't. It's not a problem with command syntax, specific host or service - it's a global thing. If I run the command manually they work fine, as shown below.
Here's an example of a failing PostgreSQL service: "(Return code of 127 is out of bounds - plugin may be missing)" Run it manually: > su -c '/usr/local/nagios/libexec/check_pgsql -H <HOST_IP> -P 5432 -d > <DB_NAME> -l <LOGIN_USER> -w 30 -c 60' - nagios > OK - database <DB_NAME> (0 > sec.)|time=0.000000s;30.000000;60.000000;0.000000 If I force the service to re-poll for an active check then that error will clear and come up OK, but then another service will fail. Currently I've got 3 failures on services that are actually up and working. Take another example - the SSH service on the Nagios machine - currently reading "CRITICAL - Server answer:". The flapping state is "Percent State Change:72.70%" which suggests the service is coming up and down extremely randomly, however the machine and SSH service is working fine. The command for this is: define command { command_name Check_SSH command_line /usr/local/nagios/libexec/check_ssh -H $ARG1$ -p 3322 } And the service definition: define service { host_name Perth,Sydney-1 use Service_Template service_description Encrypted Remote Access - SSH check_command Check_SSH!$HOSTADDRESS$ } And the same for running a HTTP service which is reading "(No status!)" manually: > [EMAIL PROTECTED] zones]# su -c '/usr/local/nagios/libexec/check_http -H > www.andyshellam.eu -N -p 80 -A "Nagios/2.4/dns.mailnetwork.co.uk" -f > follow -w 30 -c 60 -t 120' - nagios > HTTP OK HTTP/1.1 200 OK - 1023 bytes in 0.006 seconds > |time=0.006282s;30.000000;60.000000;0.000000 size=1023B;;;0 Andy. Hugo van der Kooij wrote: > On Sun, 27 Aug 2006, Andy Shellam wrote: > > >> I've been using Nagios for around 5 months now with no problems. I've >> recently added a new server onto my network, which has added somewhere >> in the region of another 3 hosts and 12 services onto Nagios. >> >> Since then I now keep getting random errors in the "Status Information" >> for services only. >> >> For example I've got a HTTP monitor which monitors >> http://photos.andyshellam.eu:80, and this has started saying "Name or >> service not known" or "(No output!)" and labelled with either an OK or >> CRITICAL state (when the site is actually OK.) >> > > I think you could improve the likelyhood of getting help by providing: > - host definition (+ template if needed) > - service definition (+ template if needed) > - checkcommand definition > - Results of check command as user nagios from the commandline > > Hugo. > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null