Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
On 6/7/2013 9:28 AM, C. Bensend wrote: Not real sure why Nagios doesn't think that's a valid config - I want a contact that will receive only UNKNOWN alerts for services. Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? -- -Chris -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
[Nagios-users] Return code of 127 is out of bounds - plugin may be missing
Send to nagios-users@lists.sourceforge.net Greetings, all. I've googled the subject above and evaluated the answers I've found but haven't yet found info that pinpoints my issue. I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running for a few years, I just inherited it's care and maintenance recently. On one of my monitored servers I write a script checkRAID.sh that calls another piece of code, looks at the results, and returns either a 0 or a 2 (the result will always be either good or critical, depending on whether the RAID controller is unhappy). Nagios runs as user nagios. The remote machine is configured to allow user nagios to log in without a password, using a key pair. This works. In /usr/local/nagios/etc/checkcommands.cfg I have : define command{ command_namecheck_raid command_line/usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh } When I become nagios (su - nagios) and run that script, I get: [nagios@nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh Check failed [nagios@nagios ~]$ echo $? 2 [nagios@nagios ~]$ That Check failed line is what's written to stdout just before returning an exit code of 2. This shows me that the remote script is working fine, and that the local nagios user is able to execute it with no problems. However, once I add an entry to services.cfg to tie this service check to my remote host and give it time to run the command, when I look at nagios' Services page it shows : check_raid CRITICAL 06-10-2013 21:17:250d 6h 14m 29s3/3 (Return code of 127 is out of bounds - plugin may be missing) This has me baffled. The return code is quite clearly 2. I recently set debug_level to -1 and restarted. I'm hoping that the debug log will Daniel Mahoney dm5...@att.com -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing
Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie From: MAHONEY, DANIEL [mailto:dm5...@att.com] Sent: Monday, June 10, 2013 5:27 PM To: nagios-users@lists.sourceforge.net; MAHONEY, DANIEL Subject: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Send to nagios-users@lists.sourceforge.netmailto:nagios-users@lists.sourceforge.net Greetings, all. I've googled the subject above and evaluated the answers I've found but haven't yet found info that pinpoints my issue. I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running for a few years, I just inherited it's care and maintenance recently. On one of my monitored servers I write a script checkRAID.sh that calls another piece of code, looks at the results, and returns either a 0 or a 2 (the result will always be either good or critical, depending on whether the RAID controller is unhappy). Nagios runs as user nagios. The remote machine is configured to allow user nagios to log in without a password, using a key pair. This works. In /usr/local/nagios/etc/checkcommands.cfg I have : define command{ command_namecheck_raid command_line/usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh } When I become nagios (su - nagios) and run that script, I get: [nagios@nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh Check failed [nagios@nagios ~]$ echo $? 2 [nagios@nagios ~]$ That Check failed line is what's written to stdout just before returning an exit code of 2. This shows me that the remote script is working fine, and that the local nagios user is able to execute it with no problems. However, once I add an entry to services.cfg to tie this service check to my remote host and give it time to run the command, when I look at nagios' Services page it shows : check_raid CRITICAL 06-10-2013 21:17:250d 6h 14m 29s3/3 (Return code of 127 is out of bounds - plugin may be missing) This has me baffled. The return code is quite clearly 2. I recently set debug_level to -1 and restarted. I'm hoping that the debug log will Daniel Mahoney dm5...@att.commailto:dm5...@att.com -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing
No, I'm sure that the key is working. When I become the nagios user and run the exact same command from the command line, it gives me exactly the result I expect. From: James Pratt [mailto:jpr...@norwich.edu] Sent: Monday, June 10, 2013 4:38 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing
On Mon, Jun 10, 2013 at 09:27:21PM +, MAHONEY, DANIEL wrote: check_raid CRITICAL 06-10-2013 21:17:250d 6h 14m 29s3/3 (Return code of 127 is out of bounds - plugin may be missing) This has me baffled. The return code is quite clearly 2. I recently set debug_level to -1 and restarted. I'm hoping that the debug log will exit status 127 often means that exec failed - it wasn't able to find the program/script specified. That could be that check_by_ssh was missing, or that CheckRaid.sh was missing, or that CheckRaid.sh exited 127 because one of its commands was missing, perhaps because PATH wasn't set as intended, probably missing /usr/local/s?bin or such (I'm wagering it's that). Your message was truncated, but if further debugging is needed, I'd recommend using strace or sh -x to see what command isn't being found. You could do something like: /usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'sh -x /home/nagios/checkRAID.sh 2nagios.err' Or: /usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'strace -e execve /home/nagios/checkRAID.sh 2nagios.err' BTW, using su to become a role account is typically unneeded, and (I find) ugly. You can almost always use sudo -H -u nagios ... That works even if the account is locked/disabled/noshell/etc. Justin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing
Ok. Can you look on the remote host and perhaps set the debug level high(er)on the sshd server and restart/retest, then check var/log/secure there or whatever it's at after a failure? Sorry, just kinda grasping out there in hopes I can help That's a weird error, I'd like to know what is causing it since it's really not clear... :\ From: MAHONEY, DANIEL [mailto:dm5...@att.com] Sent: Monday, June 10, 2013 5:42 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing No, I'm sure that the key is working. When I become the nagios user and run the exact same command from the command line, it gives me exactly the result I expect. From: James Pratt [mailto:jpr...@norwich.edu] Sent: Monday, June 10, 2013 4:38 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing
That is not quite the same though. Nagios user gets home environment setup from .bash_profile (or similar) from user's home directory as well as correct path to .ssh in nagios user's home directory. Nagios starts as root and does setuid to nagios user but does not get same path set. I too recommend you use -v option to find what is going on. Also try specifying exact path to key with -i (and those are best set in a specific directory with key name coming from host macro variable) On Mon, Jun 10, 2013 at 2:42 PM, MAHONEY, DANIEL dm5...@att.com wrote: No, I’m sure that the key is working. When I become the nagios user and run the exact same command from the command line, it gives me exactly the result I expect. From: James Pratt [mailto:jpr...@norwich.edu] Sent: Monday, June 10, 2013 4:38 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Hi I haven’t researched this or anything, but is there is a –v option to check_by_ssh to get the exact error thrown? – I’m simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)… Cheers! Jamie -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
This is by design, and it is only a warning message. The config is valid and should work as you intended. It doesn't make sense to get a recovery notification for something you never knew was a problem. Unknowns are not considered problems in Nagios logic. On Mon, Jun 10, 2013 at 1:25 PM, Chris Beattie cbeat...@geninfo.com wrote: On 6/7/2013 9:28 AM, C. Bensend wrote: Not real sure why Nagios doesn't think that's a valid config - I want a contact that will receive only UNKNOWN alerts for services. Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? -- -Chris -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Misplaced advice in the Nagios preflight check?
Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? No, I haven't. It *seems* to be working as I intend. My question is more as to why Nagios seems to think it's a bad idea, when it's a perfectly legitimate configuration. Are there unforeseen consequences that I'm not aware of? Or was it just not a configuration anyone thought would be useful/valid, so it is warned about? -- The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.' -- George Carlin -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null