Re: [Nagios-users] Misplaced advice in the Nagios preflight check?

2013-06-10 Thread Chris Beattie
On 6/7/2013 9:28 AM, C. Bensend wrote: 
 Not real sure why Nagios doesn't think that's a valid config - I
 want a contact that will receive only UNKNOWN alerts for services.

Have you tried giving that contact the extra options Nagios wants, and then 
defining a service escalation for that contact with the escalation_options 
directive set to u?

-- 
-Chris

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread MAHONEY, DANIEL
Send to nagios-users@lists.sourceforge.net

Greetings, all. I've googled the subject above and evaluated the answers I've 
found but haven't yet found info that pinpoints my issue.

I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running 
for a few years, I just inherited it's care and maintenance recently. On one of 
my monitored servers I write a script checkRAID.sh that calls another piece 
of code, looks at the results, and returns either a 0 or a 2 (the result will 
always be either good or critical, depending on whether the RAID controller is 
unhappy).

Nagios runs as user nagios. The remote machine is configured to allow user 
nagios to log in without a password, using a key pair. This works.

In /usr/local/nagios/etc/checkcommands.cfg I have :
define command{
command_namecheck_raid
command_line/usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l 
nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 
/home/nagios/checkRAID.sh
}

When I become nagios (su - nagios) and run that script, I get:
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H remote server IP 
-l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 
/home/nagios/checkRAID.sh
Check failed
[nagios@nagios ~]$ echo $?
2
[nagios@nagios ~]$

That Check failed line is what's written to stdout just before returning an 
exit code of 2. This shows me that the remote script is working fine, and that 
the local nagios user is able to execute it with no problems.  However, once I 
add an entry to services.cfg to tie this service check to my remote host and 
give it time to run the command, when I look at nagios' Services page it 
shows :

check_raid  CRITICAL  06-10-2013 21:17:250d 6h 14m 
29s3/3 (Return code of 127 is out of bounds - plugin may be missing)

This has me baffled. The return code is quite clearly 2.

I recently set debug_level to -1 and restarted. I'm hoping that the debug log 
will

Daniel Mahoney
dm5...@att.com

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread James Pratt
Hi I haven't researched this or anything, but is there is a -v option to 
check_by_ssh to get the exact error thrown? - I'm simply wondering if you have 
a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been 
too busy to be much help on nagios lately guys)...

Cheers!
Jamie

From: MAHONEY, DANIEL [mailto:dm5...@att.com]
Sent: Monday, June 10, 2013 5:27 PM
To: nagios-users@lists.sourceforge.net; MAHONEY, DANIEL
Subject: [Nagios-users] Return code of 127 is out of bounds - plugin may be 
missing

Send to 
nagios-users@lists.sourceforge.netmailto:nagios-users@lists.sourceforge.net

Greetings, all. I've googled the subject above and evaluated the answers I've 
found but haven't yet found info that pinpoints my issue.

I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running 
for a few years, I just inherited it's care and maintenance recently. On one of 
my monitored servers I write a script checkRAID.sh that calls another piece 
of code, looks at the results, and returns either a 0 or a 2 (the result will 
always be either good or critical, depending on whether the RAID controller is 
unhappy).

Nagios runs as user nagios. The remote machine is configured to allow user 
nagios to log in without a password, using a key pair. This works.

In /usr/local/nagios/etc/checkcommands.cfg I have :
define command{
command_namecheck_raid
command_line/usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l 
nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 
/home/nagios/checkRAID.sh
}

When I become nagios (su - nagios) and run that script, I get:
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H remote server IP 
-l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 
/home/nagios/checkRAID.sh
Check failed
[nagios@nagios ~]$ echo $?
2
[nagios@nagios ~]$

That Check failed line is what's written to stdout just before returning an 
exit code of 2. This shows me that the remote script is working fine, and that 
the local nagios user is able to execute it with no problems.  However, once I 
add an entry to services.cfg to tie this service check to my remote host and 
give it time to run the command, when I look at nagios' Services page it 
shows :

check_raid  CRITICAL  06-10-2013 21:17:250d 6h 14m 
29s3/3 (Return code of 127 is out of bounds - plugin may be missing)

This has me baffled. The return code is quite clearly 2.

I recently set debug_level to -1 and restarted. I'm hoping that the debug log 
will

Daniel Mahoney
dm5...@att.commailto:dm5...@att.com

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread MAHONEY, DANIEL
No, I'm sure that the key is working. When I become the nagios user and run the 
exact same command from the command line, it gives me exactly the result I 
expect.

From: James Pratt [mailto:jpr...@norwich.edu]
Sent: Monday, June 10, 2013 4:38 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be 
missing

Hi I haven't researched this or anything, but is there is a -v option to 
check_by_ssh to get the exact error thrown? - I'm simply wondering if you have 
a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been 
too busy to be much help on nagios lately guys)...

Cheers!
Jamie
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread Justin T Pryzby
On Mon, Jun 10, 2013 at 09:27:21PM +, MAHONEY, DANIEL wrote:

 check_raid  CRITICAL  06-10-2013 21:17:250d 6h 
 14m 29s3/3 (Return code of 127 is out of bounds - plugin may be 
 missing)
 
 This has me baffled. The return code is quite clearly 2.
 
 I recently set debug_level to -1 and restarted. I'm hoping that the debug log 
 will

exit status 127 often means that exec failed - it wasn't able to
find the program/script specified.  That could be that check_by_ssh
was missing, or that CheckRaid.sh was missing, or that CheckRaid.sh
exited 127 because one of its commands was missing, perhaps because
PATH wasn't set as intended, probably missing /usr/local/s?bin or
such (I'm wagering it's that).

Your message was truncated, but if further debugging is needed, I'd
recommend using strace or sh -x to see what command isn't being found.

You could do something like:
/usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i 
/home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'sh -x 
/home/nagios/checkRAID.sh 2nagios.err'

Or:
/usr/local/nagios/libexec/check_by_ssh -H remote server IP -l nagios -i 
/home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'strace -e execve 
/home/nagios/checkRAID.sh 2nagios.err'

BTW, using su to become a role account is typically unneeded, and
(I find) ugly.  You can almost always use sudo -H -u nagios ...
That works even if the account is locked/disabled/noshell/etc.

Justin

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread James Pratt
Ok. Can you look on the remote host and perhaps set the debug level high(er)on 
the sshd server and restart/retest, then check var/log/secure there or whatever 
it's at after a failure? Sorry, just kinda grasping out there in hopes I can 
help That's a weird error, I'd like to know what is causing it since it's 
really not clear... :\

From: MAHONEY, DANIEL [mailto:dm5...@att.com]
Sent: Monday, June 10, 2013 5:42 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be 
missing

No, I'm sure that the key is working. When I become the nagios user and run the 
exact same command from the command line, it gives me exactly the result I 
expect.

From: James Pratt [mailto:jpr...@norwich.edu]
Sent: Monday, June 10, 2013 4:38 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be 
missing

Hi I haven't researched this or anything, but is there is a -v option to 
check_by_ssh to get the exact error thrown? - I'm simply wondering if you have 
a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been 
too busy to be much help on nagios lately guys)...

Cheers!
Jamie
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing

2013-06-10 Thread William Leibzon
That is not quite the same though. Nagios user gets home environment
setup from .bash_profile (or similar) from user's home directory as
well as correct path to .ssh in nagios user's home directory. Nagios
starts as root and does setuid to nagios user but does not get same
path set.

I too recommend you use -v option to find what is going on. Also try
specifying exact path to key with -i (and those are best set in a
specific directory with key name coming from host macro variable)

On Mon, Jun 10, 2013 at 2:42 PM, MAHONEY, DANIEL dm5...@att.com wrote:
 No, I’m sure that the key is working. When I become the nagios user and run
 the exact same command from the command line, it gives me exactly the result
 I expect.

 From: James Pratt [mailto:jpr...@norwich.edu]
 Sent: Monday, June 10, 2013 4:38 PM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may
 be missing


 Hi I haven’t researched this or anything, but is there is a –v option to
 check_by_ssh to get the exact error thrown? – I’m simply wondering if you
 have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry,
 ive been too busy to be much help on nagios lately guys)…



 Cheers!

 Jamie


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting
 any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Misplaced advice in the Nagios preflight check?

2013-06-10 Thread Travis Runyard
This is by design, and it is only a warning message. The config is valid
and should work as you intended. It doesn't make sense to get a recovery
notification for something you never knew was a problem. Unknowns are not
considered problems in Nagios logic.


On Mon, Jun 10, 2013 at 1:25 PM, Chris Beattie cbeat...@geninfo.com wrote:

 On 6/7/2013 9:28 AM, C. Bensend wrote:
  Not real sure why Nagios doesn't think that's a valid config - I
  want a contact that will receive only UNKNOWN alerts for services.

 Have you tried giving that contact the extra options Nagios wants, and
 then defining a service escalation for that contact with the
 escalation_options directive set to u?

 --
 -Chris


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Misplaced advice in the Nagios preflight check?

2013-06-10 Thread C. Bensend

 Have you tried giving that contact the extra options Nagios wants, and
 then defining a service escalation for that contact with the
 escalation_options directive set to u?

No, I haven't.  It *seems* to be working as I intend.  My question is
more as to why Nagios seems to think it's a bad idea, when it's a
perfectly legitimate configuration.  Are there unforeseen consequences
that I'm not aware of?  Or was it just not a configuration anyone
thought would be useful/valid, so it is warned about?


-- 
The very existence of flamethrowers proves that sometime, somewhere,
someone said to themselves, 'You know, I want to set those people
over there on fire, but I'm just not close enough to get the job done.'  
   -- George Carlin


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null