[Nagios-users] Nagios not checking on time

2008-10-06 Thread Marc Torres
Hi all.

I'm using Nagios v3.0 with NDO. My problem is that nagios isn't checking the
services at the time it says it will (from NDO tables):

+---+-+-+-+
| current_state | display_name| next_check  |
last_check  |
+---+-+-+-+
| 2 | CPU Load| 2008-10-06 10:00:00 | 2008-10-03
12:16:35 |
| 2 | Drive Space C   | 2008-10-06 10:00:00 | 2008-10-03
12:03:39 |
| 0 | Explorer| 2008-10-06 10:00:00 | 1970-01-01
01:00:00 |
| 0 | Memory Usage| 2008-10-06 10:00:00 | 1970-01-01
01:00:00 |
| 0 | NSClient++ Version  | 2008-10-06 10:00:00 | 1970-01-01
01:00:00 |
| 0 | Registro de Sucesos | 2008-10-06 10:00:00 | 1970-01-01
01:00:00 |
| 0 | Uptime  | 2008-10-06 10:00:00 | 1970-01-01
01:00:00 |
+---+-+-+-+

but now it's 2008-10-06 12:10 and no checks where done at all on this
services. I've to say that this host is down, but no where is written that
the servcie check should not be made when nagios detects that the host is
down (I've read de check_hosts and check_services docs).

Am I missing something? Or this is normal behaviour?

Cheers,

Marc.

PS: If you need more info for helping me (like host and services config) let
me know.

-- 
I'm unique, just like everyone else. Read it out there
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] notify-html-email.sh

2008-10-06 Thread Nagios User
Hello,
I am trying to us the notify-html-email.sh mailscript on nagios 3.0.1
with no success. Does this work with Nagios 3.0.1? if you have any
information please let me know.

http://www.nagiosexchange.org/cgi-bin/page.cgi?g=2002.html;d=1


Thanks,
-john

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Time periods seem to be no-ops

2008-10-06 Thread Patrick Rutkowski
We've defined the following time period for use with a service which we
know to be down between 9am and 10am every day:

== BEGIN CODE SNIPPET ==
define timeperiod{
 timeperiod_name do_not_notify_bw_9am_10am
 alias   Every day and time, except 9am-10am
 sunday  00:00-8:59,10:01-24:00
 monday  00:00-8:59,10:01-24:00
 tuesday 00:00-8:59,10:01-24:00
 wednesday   00:00-8:59,10:01-24:00
 thursday00:00-8:59,10:01-24:00
 friday  00:00-8:59,10:01-24:00
 saturday00:00-8:59,10:01-24:00
 }
== END CODE SNIPPET ==

We then applied it to a service like so:

== BEGIN CODE SNIPPET ==
define service {
 host_name   our-special-host
 service_description Check status of MySQL  
reporting:3307
 check_command   check_mysql_service!16105
 use generic-service
 notification_period do_not_notify_bw_9am_10am
 notification_interval   0
 contact_groups  company-admins
 }
== END CODE SNIPPET ==

Oddly, the service continues to give us notifications during that time
interval. We tried setting check_interval instead of
notification_interval, but still no luck. We also tried setting both
{notification,check}_interval, but again, no luck.

To preempt the obvious question, yes, we did restart
Nagios. Additionally, to ensure that the time period was indeed
getting attached to the service, we used to GUI page to browse to
Configuration - View Config - Object Type: Services and
verified that the columns entitled Check Period and Notification
Period had do_not_notify_bw_9am_10am instead of the usual 24x7.

Despite all this, we're still getting warnings during this time
interval.

As a quick fix, we can use SCHEDULE_SVC_DOWNTIME with
/var/lib/nagios2/rw/nagios.cmd, which does work; but it definitely
doesn't feel like a permanent solution.

-Patrick


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] 3-D Status Map not working

2008-10-06 Thread Mark Young

On Oct 3, 2008, at 6:31 PM, Rick Knight wrote:
 The only config file I have that points to /usr/local/nagios/sbin is  
 the Apache2 config file for Nagios, nagios.conf. I've included it  
 below...

 ## Apache2 config file for Nagios

This looks to be the standard sample config for apache which should  
not be a problem.



 What is the next step in trouble shooting this problem?

I believe the clear next step for you is to work on your configuration  
files.  I would backup your configuration, stop nagios, install the  
sample config files ('make install-config' in the source directory),  
then start nagios.  If that fixes the problem I'd look at the cgi.cfg  
and nagios.cfg as having some problem settings.  If it does not  
work... It could be a number of things including the openvrml plugin  
you are using.  A better tracing of the program and looking deeper  
into the logs might be required.

The VRML components have largely been unmaintained for a long time.   
VRML technology never really took off.  And sadly the 3D map is not as  
useful as you may hope. :(  If you are looking to enhanced maps I  
would look at the great NagVis (http://www.nagvis.org/) project.  If  
you are looking for more enhanced interactive mapping however, you may  
have to wait for something coming in the near future. ;)


Mark Young
___
Nagios Enterprises, LLC
Web:www.nagios.com

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Hi,

I have a couple machines that spit out a warning similar to this:

WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
check_disk' returned status 1

I believe this to be caused by the check itself is timing out. As when  
I try to login it will sometimes take up to a minute or two just to  
get a prompt.

The server will respond to ping, so I'm generally not totally  
concerned about it. And the checks usually clear up in 5 minutes or  
soon as the server gets whatever IO hog out of the way.

Is anyone else experiencing this, and if so how do you cope / deal  
with this?

Thanks,

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
I should also  mention that I also have these timeouts in place...

service_check_timeout=90
host_check_timeout=30
event_handler_timeout=30
notification_timeout=60
ocsp_timeout=5
perfdata_timeout=5

Charlie

On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As  
 when I try to login it will sometimes take up to a minute or two  
 just to get a prompt.

 The server will respond to ping, so I'm generally not totally  
 concerned about it. And the checks usually clear up in 5 minutes or  
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal  
 with this?

 Thanks,

 Charlie


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Matt Rivet
Are you using a LDAP server and RSA keys?

-Original Message-
From: Charlie Reddington [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 06, 2008 11:35 AM
To: Nagios User list
Subject: [Nagios-users] check_by_ssh timeouts / how to work around?

Hi,

I have a couple machines that spit out a warning similar to this:

WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/ 
check_disk' returned status 1

I believe this to be caused by the check itself is timing out. As when  
I try to login it will sometimes take up to a minute or two just to  
get a prompt.

The server will respond to ping, so I'm generally not totally  
concerned about it. And the checks usually clear up in 5 minutes or  
soon as the server gets whatever IO hog out of the way.

Is anyone else experiencing this, and if so how do you cope / deal  
with this?

Thanks,

Charlie


-
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK  win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the
world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread James
On Mon, October 6, 2008 11:37 am, Charlie Reddington wrote:
 I should also  mention that I also have these timeouts in place...


 service_check_timeout=90 host_check_timeout=30 event_handler_timeout=30
 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5

 Charlie


 On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:


 Hi,


 I have a couple machines that spit out a warning similar to this:


 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As
 when I try to login it will sometimes take up to a minute or two just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes or soon
 as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,


 Charlie

The timeouts in nagios.cfg are ow long the nagios process waits before
aborting a check.
There are usually check specific timeouts that you can add to the command
definition.
Run the check_* command manually and see what the syntax is (sometimes '-t
xx').


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington
Sorry, forgot the mail list

I'm using no ldap, but with DSA keys.

On Oct 6, 2008, at 10:58 AM, Matt Rivet wrote:

 Are you using a LDAP server and RSA keys?

 -Original Message-
 From: Charlie Reddington [mailto:[EMAIL PROTECTED]
 Sent: Monday, October 06, 2008 11:35 AM
 To: Nagios User list
 Subject: [Nagios-users] check_by_ssh timeouts / how to work around?

 Hi,

 I have a couple machines that spit out a warning similar to this:

 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As when
 I try to login it will sometimes take up to a minute or two just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes or
 soon as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,

 Charlie

 
 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's
 challenge
 Build the coolest Linux based applications with Moblin SDK  win great
 prizes
 Grand prize is a trip for two to an Open Source event anywhere in the
 world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Charlie Reddington

On Oct 6, 2008, at 11:03 AM, James wrote:

 On Mon, October 6, 2008 11:37 am, Charlie Reddington wrote:
 I should also  mention that I also have these timeouts in place...


 service_check_timeout=90 host_check_timeout=30  
 event_handler_timeout=30
 notification_timeout=60 ocsp_timeout=5 perfdata_timeout=5

 Charlie


 On Oct 6, 2008, at 10:35 AM, Charlie Reddington wrote:


 Hi,


 I have a couple machines that spit out a warning similar to this:


 WARNING - check_by_ssh: Remote command '/home/nagios/nagios-plugs/
 check_disk' returned status 1

 I believe this to be caused by the check itself is timing out. As
 when I try to login it will sometimes take up to a minute or two  
 just to
 get a prompt.

 The server will respond to ping, so I'm generally not totally
 concerned about it. And the checks usually clear up in 5 minutes  
 or soon
 as the server gets whatever IO hog out of the way.

 Is anyone else experiencing this, and if so how do you cope / deal
 with this?

 Thanks,


 Charlie

 The timeouts in nagios.cfg are ow long the nagios process waits before
 aborting a check.
 There are usually check specific timeouts that you can add to the  
 command
 definition.
 Run the check_* command manually and see what the syntax is  
 (sometimes '-t
 xx').


I thought I had did that already , and just put the --timeout option  
on the check_by_ssh, but I guess not. I added the timeout, from 30   
to  60.  We'll see how it goes.

Charlie

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] nrpe: plugins on remove host?

2008-10-06 Thread Marcelo M. Garcia
Hi

To monitor a remove host using nagios-plugins and nrpe, do I need to 
have nagios and plugins or just plugins on the remote host?

Thanks

Marcelo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] check_by_ssh timeouts / how to work around?

2008-10-06 Thread Matthew Pounsett




I believe this to be caused by the check itself is timing out. As when
I try to login it will sometimes take up to a minute or two just to
get a prompt.




As for setting the timeouts for that sort of thing, this is what I do.

In my resource.cfg:
--
# check_by_ssh timeout
$USER4$=10
--

.. and in my commands.cfg definitions..
---
# 'check_disk_remote' command definition
define command {
command_namecheck_disk_remote
command_line$USER1$/check_by_ssh -H $HOSTADDRESS$ -t $USER4$ - 
C $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

}
---

And I use the same $USER4$ definition for all of the check_by_ssh  
calls, so that it's easy to tune.


Have you looked into the reason for the long login delay though?  I  
think I'd start there.  A 60 second wait for ssh to get you a shell  
indicates some sort of problem.  Either the target machine is so  
resource starved that it can't negotiate the authentication and  
encryption, or you've got some other delay in there.  The most likely  
culprit to my mind is DNS -- ssh itself, login and your shell on the  
target machine might all be trying to do a reverse DNS lookup on the  
source of the connection.  If that's timing out, it could cause very  
long delays.   There are lots of other potential problems, but I'd  
start looking there.





PGP.sig
Description: This is a digitally signed message part
-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] nrpe: plugins on remove host?

2008-10-06 Thread Andy Shellam
Hi Marcelo,

To monitor a remote host, you need the check_nrpe plugin on the Nagios 
server (this is built as part of the NRPE package) and on each remote 
host you need the NRPE server and the plugins.

On a side-note, you also need OpenSSL if you want the channel between 
Nagios and the remote hosts to be encrypted.

Andy

Marcelo M. Garcia wrote:
 Hi

 To monitor a remove host using nagios-plugins and nrpe, do I need to 
 have nagios and plugins or just plugins on the remote host?

 Thanks

 Marcelo

 -
 This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
 Build the coolest Linux based applications with Moblin SDK  win great prizes
 Grand prize is a trip for two to an Open Source event anywhere in the world
 http://moblin-contest.org/redirect.php?banner_id=100url=/
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when reporting 
 any issue. 
 ::: Messages without supporting info will risk being sent to /dev/null

   

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Kelly Jones
What's the best way to simulate (not schedule) downtime in nagios?

I want to pretend a service is down for a certain amount of time to
see what alerts nagios sends, etc.

I've come up w/ two bad ways to do this:

 % Edit the config file to change the test to check_dummy. I want to
 run these fire drills via cron, and editing a file and restarting
 nagios seems a little ugly.

 % Submit a passive check saying the service is down, and reschedule
 the next check 4 hours later, so the service is 'down' for 4
 hours. This can be done using the nagios named pipe, so it's easy to
 cron. Problem: doing things this way suppresses the alerts (when you
 don't test a service, it doesn't send an alert).

Thoughts?

-- 
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Andy Shellam
Hi Kelly,

When I've done this in the past, for network services (e.g. http/smtp 
checks) I've actually blocked the target port on the Nagios server, 
which gives a better simulation that the service is down (e.g. for HTTP 
checks, block the Nagios server's outbound port 80.) 

This works for us because as well as the router firewalls, each server 
runs a local software firewall, so it's easy to block outbound packets 
to a particular port on the Nagios server without affecting the service 
itself, simulating the effect of a network/service failure.

However when it comes to checks such as disk space, it can be a bit 
trickier!  I've done things like changing the thresholds for a failure 
(e.g. if disk space is currently 15% capacity, I set my warning alert to 
be 20%, restart Nagios and wait for the alerts to come, and the same for 
critical, then reset back to 90% when complete) and I have done before 
as you suggested, change the service's check and retry intervals in 
Nagios to be something lengthy (e.g. an hour) then submit a passive 
'failure' check result and wait until Nagios re-checks the service - 
this method also checks how Nagios alerts you when the service returns 
to OK.

Hope this helps, it'd be interesting to hear how/if others do it!

Andy

Kelly Jones wrote:
 What's the best way to simulate (not schedule) downtime in nagios?

 I want to pretend a service is down for a certain amount of time to
 see what alerts nagios sends, etc.

 I've come up w/ two bad ways to do this:

  % Edit the config file to change the test to check_dummy. I want to
  run these fire drills via cron, and editing a file and restarting
  nagios seems a little ugly.

  % Submit a passive check saying the service is down, and reschedule
  the next check 4 hours later, so the service is 'down' for 4
  hours. This can be done using the nagios named pipe, so it's easy to
  cron. Problem: doing things this way suppresses the alerts (when you
  don't test a service, it doesn't send an alert).

 Thoughts?

   

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Hugo van der Kooij
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kelly Jones wrote:
 What's the best way to simulate (not schedule) downtime in nagios?

Why do you care to do this in a live environment?

I think you should considere these point:

 1. Duplicate your production environment (nagios server) into a test
environment and play all you want.

 2. Tell us what you suspect is not working and what you think this
simulation will tell you to solve it.

Hugo.

- --
[EMAIL PROTECTED]   http://hugo.vanderkooij.org/
PGP/GPG? Use: http://hugo.vanderkooij.org/0x58F19981.asc

A: Yes.
Q: Are you sure?
A: Because it reverses the logical flow of conversation.
Q: Why is top posting frowned upon?

Bored? Click on http://spamornot.org/ and rate those images.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFI6oOgBvzDRVjxmYERAoNkAJ9HHarh6umEg5XrZxwEvTRk3twQaACgg2bD
821LCtG8/mhddhBuqo1vipE=
=7YF2
-END PGP SIGNATURE-

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Tom Throckmorton
On Oct 06 18:57, Kelly Jones wrote:
 Thanks, Tom.
 
 Yes, I'm trying to simulate a host/service outage, not scheduled downtime.
 
 The problem w/ submitting a passive check is that the next ACTIVE check will
 invalidate it. Example: you tell nagios that machine foo is down. That's soft
 alert 1, not enough to generate any emails. Nagios then active checks foo and
 sees that it's up. Of course, you can submit another passive check, but
 you'll ping-pong (flap) between up and down states.

OK, so it sounds like you want to be able to have Nagios temporarily stop
managing the service check scheduling for this service, long enough for you to
inject some bogus results.  Seems like rescheduling the next active check
(SCHEDULE_FORCED_SVC_CHECK) would do the right thing as far as pushing the next
scheduled check into the future.  Or maybe you want to disable active checks
for the service (DISABLE_SVC_CHECK), run your simulation, and then re-enable
them...?


-tt


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Kelly Jones
Thanks, Tom.

Yes, I'm trying to simulate a host/service outage, not scheduled downtime.

The problem w/ submitting a passive check is that the next ACTIVE
check will invalidate it. Example: you tell nagios that machine foo is
down. That's soft alert 1, not enough to generate any emails. Nagios
then active checks foo and sees that it's up. Of course, you can
submit another passive check, but you'll ping-pong (flap) between up
and down states.

-- 
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

On 10/6/08, Tom Throckmorton [EMAIL PROTECTED] wrote:
 On Oct 06 12:29, Kelly Jones wrote:
 What's the best way to simulate (not schedule) downtime in nagios?

 I want to pretend a service is down for a certain amount of time to
 see what alerts nagios sends, etc.

 Just to clarify, are you trying to simulate a service outage (as opposed to
 simulating a scheduled downtime) so you can test alerts, and perhaps
 notifications, in order to validate your configuration?

 I've come up w/ two bad ways to do this:

  % Edit the config file to change the test to check_dummy. I want to
  run these fire drills via cron, and editing a file and restarting
  nagios seems a little ugly.

  % Submit a passive check saying the service is down, and reschedule
  the next check 4 hours later, so the service is 'down' for 4
  hours. This can be done using the nagios named pipe, so it's easy to
  cron.  Problem: doing things this way suppresses the alerts (when you
  don't test a service, it doesn't send an alert).

 Thoughts?

 I use something similar to the second method to do ad hoc validation of
 alerts/notifications, by submitting passive results via an external command,
 though without diddling the service check scheduling.  I'm a little confused
 by
 your last statement though...

 If you're only submitting a single passive check and then rescheduling the
 next
 check, of course there will be no alerts (and you'll likely never reach
 $max_check_attempts) - is there some reason you can't submit multiple
 passive
 check results?

 -tt

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Simulating downtime in nagios

2008-10-06 Thread Tom Throckmorton
On Oct 06 12:29, Kelly Jones wrote:
 What's the best way to simulate (not schedule) downtime in nagios?
 
 I want to pretend a service is down for a certain amount of time to
 see what alerts nagios sends, etc.

Just to clarify, are you trying to simulate a service outage (as opposed to
simulating a scheduled downtime) so you can test alerts, and perhaps
notifications, in order to validate your configuration?

 I've come up w/ two bad ways to do this:
 
  % Edit the config file to change the test to check_dummy. I want to
  run these fire drills via cron, and editing a file and restarting
  nagios seems a little ugly.
 
  % Submit a passive check saying the service is down, and reschedule
  the next check 4 hours later, so the service is 'down' for 4
  hours. This can be done using the nagios named pipe, so it's easy to
  cron.  Problem: doing things this way suppresses the alerts (when you
  don't test a service, it doesn't send an alert).
 
 Thoughts?

I use something similar to the second method to do ad hoc validation of
alerts/notifications, by submitting passive results via an external command,
though without diddling the service check scheduling.  I'm a little confused by
your last statement though...

If you're only submitting a single passive check and then rescheduling the next
check, of course there will be no alerts (and you'll likely never reach
$max_check_attempts) - is there some reason you can't submit multiple passive
check results?

-tt

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Time periods seem to be no-ops

2008-10-06 Thread Andreas Ericsson
Patrick Rutkowski wrote:
 We've defined the following time period for use with a service which we
 know to be down between 9am and 10am every day:
 
 == BEGIN CODE SNIPPET ==
 define timeperiod{
  timeperiod_name do_not_notify_bw_9am_10am
  alias   Every day and time, except 9am-10am
  sunday  00:00-8:59,10:01-24:00
  monday  00:00-8:59,10:01-24:00
  tuesday 00:00-8:59,10:01-24:00
  wednesday   00:00-8:59,10:01-24:00
  thursday00:00-8:59,10:01-24:00
  friday  00:00-8:59,10:01-24:00
  saturday00:00-8:59,10:01-24:00
  }
 == END CODE SNIPPET ==
 
 We then applied it to a service like so:
 
 == BEGIN CODE SNIPPET ==
 define service {
  host_name   our-special-host
  service_description Check status of MySQL  
 reporting:3307
  check_command   check_mysql_service!16105
  use generic-service
  notification_period do_not_notify_bw_9am_10am
  notification_interval   0
  contact_groups  company-admins
  }
 == END CODE SNIPPET ==
 
 Oddly, the service continues to give us notifications during that time
 interval. We tried setting check_interval instead of
 notification_interval, but still no luck. We also tried setting both
 {notification,check}_interval, but again, no luck.
 
 To preempt the obvious question, yes, we did restart
 Nagios. Additionally, to ensure that the time period was indeed
 getting attached to the service, we used to GUI page to browse to
 Configuration - View Config - Object Type: Services and
 verified that the columns entitled Check Period and Notification
 Period had do_not_notify_bw_9am_10am instead of the usual 24x7.
 
 Despite all this, we're still getting warnings during this time
 interval.
 

I take it the system clock on the Nagios server is running correctly?

 As a quick fix, we can use SCHEDULE_SVC_DOWNTIME with
 /var/lib/nagios2/rw/nagios.cmd, which does work; but it definitely
 doesn't feel like a permanent solution.
 

Make sure you don't have multiple nagios instances running.

Timeperiods clearly work, or hundreds of thousands of people would
have complained on a daily basis. Something else must be going wrong,
but I can't for the life of me think of what.

-- 
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null