Re: [Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-12-01 Thread Chris Beattie
On Fri, 2010-11-19 at 11:20 -0500, Chris Beattie wrote:
 
 This time I'm trying a nearly-stock nagios.cfg file.  The one I've
 been 
 using predates Nagios 3.0.  Though it's been updated some, it doesn't 
 contain all the more-recent settings.

I was out of town for a bit.

This is still happening, but not all the time.  Most of the host checks
happen 70 seconds apart, but the too-closely spaced ones are usually 20
seconds apart. I don't know how long this has been the case.  It turns
out it doesn't usually result in a notification, so nobody's
complaining.

[11-30-2010 17:13:03] SERVICE ALERT: bgcprodiceweb4d;Service:
ScaleOut;CRITICAL;SOFT;1;SOSS: Not found
[11-30-2010 17:14:33] SERVICE ALERT: bgcprodiceweb4d;Service:
AntiVirus;WARNING;SOFT;1;No data was received from host!
[11-30-2010 17:14:43] HOST ALERT: bgcprodiceweb4d;DOWN;SOFT;1;CRITICAL -
10.3.54.208: rta nan, lost 100%
[11-30-2010 17:15:03] HOST ALERT: bgcprodiceweb4d;UP;SOFT;2;OK -
10.3.54.208: rta 33.504ms, lost 0%


Nothing in this message is intended to make or accept an offer or to form a 
contract, except that an attachment that is an image of a contract bearing the 
signature of an officer of our company may be or become a contract. This 
message (including any attachments) is intended only for the use of the 
individual or entity to whom it is addressed. It may contain information that 
is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law or may constitute as attorney work product. If 
you are not the intended recipient, we hereby notify you that any use, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this message in error, please notify us immediately by 
telephone and delete this message immediately.

Thank you.


--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-11-18 Thread Chris Beattie

On Tue, 2010-11-16 at 22:52 +0100, Andreas Ericsson wrote:
 That one was in 3.2.2 too though. Could you try un-commenting the lines
 mentioned there and see if that helps?

It looks like something weird is still happening after making that
change.  I checked some more hosts and the retry_interval is low, but
only for HOST UP alerts.


[11-18-2010 01:23:31] SERVICE ALERT: hcsprodnwweb5;Service:
Epilog;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[11-18-2010 01:23:41] HOST ALERT: hcsprodnwweb5;DOWN;SOFT;1;CRITICAL -
10.3.2.177: rta nan, lost 100%
[11-18-2010 01:24:01] HOST ALERT: hcsprodnwweb5;UP;SOFT;2;OK -
10.3.2.177: rta 1.943ms, lost 0%

[11-18-2010 01:32:51] HOST ALERT: wwwhost;DOWN;SOFT;2;CRITICAL -
10.3.1.11: rta nan, lost 100%
[11-18-2010 01:34:02] HOST ALERT: wwwhost;DOWN;HARD;3;CRITICAL -
10.3.1.11: rta nan, lost 100%
[11-18-2010 01:34:21] HOST ALERT: wwwhost;UP;HARD;1;OK - 10.3.1.11: rta
115.733ms, lost 20%


But sometimes it works the way I expect it to.


[11-18-2010 01:38:41] HOST ALERT: wwwhost;DOWN;SOFT;2;CRITICAL -
10.3.1.11: rta nan, lost 100%
[11-18-2010 01:39:51] HOST ALERT: wwwhost;DOWN;HARD;3;CRITICAL -
10.3.1.11: rta 488.367ms, lost 80%
[11-18-2010 01:49:21] HOST ALERT: wwwhost;UP;HARD;1;OK - 10.3.1.11: rta
31.928ms, lost 0%


I'm going to try reverting back to Nagios 3.2.1 to see what happens.
It's possible I had the problem then but never noticed.

Nothing in this message is intended to make or accept an offer or to form a 
contract, except that an attachment that is an image of a contract bearing the 
signature of an officer of our company may be or become a contract. This 
message (including any attachments) is intended only for the use of the 
individual or entity to whom it is addressed. It may contain information that 
is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law or may constitute as attorney work product. If 
you are not the intended recipient, we hereby notify you that any use, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this message in error, please notify us immediately by 
telephone and delete this message immediately.

Thank you.


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-11-17 Thread Chris Beattie

On Tue, 2010-11-16 at 22:52 +0100, Andreas Ericsson wrote:

 http://git.op5.org/git/?p=nagios.git;a=commitdiff;h=1149d275011d7c4d8631b44dbba30ebdb4d7e83f
 
 That one was in 3.2.2 too though. Could you try un-commenting the lines
 mentioned there and see if that helps? I won't revert that patch, but it

Thanks for the help.  So I can make sure I've correctly done what you
asked, this is what I did.  I removed lines 1415 and 1419 below from
checks.c, then did a make clean, make all, make install, and restarted
Nagios.

1414:/* Below removed 08/04/2010 EG -
http://tracker.nagios.org/view.php?id=128 */
1415-/*
1416-temp_service-state_type=HARD_STATE;
1417-temp_service-last_hard_state=temp_service-current_state;
1418-temp_service-current_attempt=1;
1419-*/

If that's right, I'll keep an eye on the frequency of our host alerts
and see what happens.

Nothing in this message is intended to make or accept an offer or to form a 
contract, except that an attachment that is an image of a contract bearing the 
signature of an officer of our company may be or become a contract. This 
message (including any attachments) is intended only for the use of the 
individual or entity to whom it is addressed. It may contain information that 
is non-public, proprietary, privileged, confidential, and exempt from 
disclosure under applicable law or may constitute as attorney work product. If 
you are not the intended recipient, we hereby notify you that any use, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this message in error, please notify us immediately by 
telephone and delete this message immediately.

Thank you.


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-11-17 Thread Andreas Ericsson
On 11/17/2010 03:55 PM, Chris Beattie wrote:
 
 On Tue, 2010-11-16 at 22:52 +0100, Andreas Ericsson wrote:
 
 http://git.op5.org/git/?p=nagios.git;a=commitdiff;h=1149d275011d7c4d8631b44dbba30ebdb4d7e83f

 That one was in 3.2.2 too though. Could you try un-commenting the lines
 mentioned there and see if that helps? I won't revert that patch, but it
 
 Thanks for the help.  So I can make sure I've correctly done what you
 asked, this is what I did.  I removed lines 1415 and 1419 below from
 checks.c, then did a make clean, make all, make install, and restarted
 Nagios.
 

That sounds about right, yes.

 1414:/* Below removed 08/04/2010 EG -
 http://tracker.nagios.org/view.php?id=128 */
 1415-/*
 1416-temp_service-state_type=HARD_STATE;
 1417-temp_service-last_hard_state=temp_service-current_state;
 1418-temp_service-current_attempt=1;
 1419-*/
 
 If that's right, I'll keep an eye on the frequency of our host alerts
 and see what happens.
 

Neat. Thanks.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-11-16 Thread Chris Beattie
I noticed something curious.  It looks like Nagios 3.2.3 is making
on-demand host checks faster than the retry_interval should allow.  The
interval_length is set to 60 and the retry_interval is set to 1.  Nagios
and the plugins were compiled from source on CentOS 5.5 x64.

 

I'm not sure if this is related to Yu Watanabe's problem
(http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg34042
.html) because I didn't start having it until after I upgraded to 3.2.3.

 

Here are some alerts from October when I was running Nagios 3.2.1.
There were service alerts too, but the host checks do not occur less
than one minute from each other:

 

--

[10-10-2010 06:41:29] HOST ALERT: wwwhost;UP;HARD;1;PING
OK - Packet loss = 0%, RTA = 50.10 ms

[10-10-2010 06:28:40] HOST ALERT:
wwwhost;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%

[10-10-2010 06:27:29] HOST ALERT:
wwwhost;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%

[10-10-2010 06:26:19] HOST ALERT:
wwwhost;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%

--

 

Here's some from earlier this month, after I'd switched from check_ping
to check_icmp.  Again, there were service alerts, but the host checks
are still about a minute apart:

 

--

[11-07-2010 21:55:53] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
4.480ms, lost 0%

[11-07-2010 21:54:43] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta nan, lost 100%

--

[11-09-2010 23:40:15] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
1.018ms, lost 0%

[11-09-2010 23:39:15] HOST ALERT: wwwhost;DOWN;SOFT;1;CRITICAL -
10.3.1.11: rta 650.987ms, lost 80%

--

 

On November 12th, I upgraded to Nagios 3.2.3 and the 1.4.15 plugins, and
got this later that evening.  The host checks were only about 20 seconds
apart:

 

--

[11-12-2010 23:46:43] SERVICE ALERT: wwwhost;Counter: IIS Web
Connections;OK;SOFT;2;Web Sessions: 2

[11-12-2010 23:45:14] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
0.985ms, lost 0%

[11-12-2010 23:44:53] HOST ALERT: wwwhost;DOWN;SOFT;1;CRITICAL -
10.3.1.11: rta 355.633ms, lost 80%

[11-12-2010 23:44:44] SERVICE ALERT: wwwhost;Counter: IIS Web
Connections;WARNING;SOFT;1;No data was received from host!

--

 

Two days later, it looked like it was behaving properly:

 

--

[11-14-2010 23:44:57] HOST ALERT: wwwhost;UP;SOFT;2;OK -
10.3.1.11: rta 1.338ms, lost 0%

[11-14-2010 23:44:27] SERVICE ALERT: wwwhost;Service:
Snare;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds

[11-14-2010 23:44:27] SERVICE ALERT: wwwhost;Service:
RServer3;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds

[11-14-2010 23:43:34] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 860.577ms, lost 80%

[11-14-2010 23:43:22] SERVICE ALERT: wwwhost;Service:
Epilog;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds

--

[11-14-2010 08:56:55] HOST ALERT: wwwhost;UP;SOFT;2;OK -
10.3.1.11: rta 2.633ms, lost 0%

[11-14-2010 08:55:45] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 518.822ms, lost 80%

[11-14-2010 08:55:36] SERVICE ALERT: wwwhost;Counter:
IIS Web Connections;WARNING;SOFT;1;No data was received from host!

--

 

Last night, however, the host got rechecked at short intervals:

 

--

[11-15-2010 23:56:09] HOST ALERT:
wwwhost;UP;SOFT;3;WARNING - 10.3.1.11: rta 89.448ms, lost 40%

[11-15-2010 23:55:39] HOST ALERT:
wwwhost;DOWN;SOFT;2;CRITICAL - 10.3.1.11: rta 984.594ms, lost 80%

[11-15-2010 23:55:21] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 738.100ms, lost 80%

[11-15-2010 23:55:09] SERVICE ALERT:
wwwhost;CPU;WARNING;SOFT;1;No data was received from host!

[11-15-2010 23:54:00] HOST FLAPPING ALERT:
wwwhost;STARTED; Host appears to have started flapping (23.0% change 
20.0% threshold)

[11-15-2010 23:53:59] HOST ALERT:
wwwhost;UP;HARD;1;WARNING - 10.3.1.11: rta 183.851ms, lost 60%

[11-15-2010 23:53:29] HOST ALERT:
wwwhost;DOWN;HARD;3;CRITICAL - 10.3.1.11: rta nan, lost 100%

[11-15-2010 23:53:29] SERVICE 

Re: [Nagios-users] Nagios Core 3.2.3 host check retry interval

2010-11-16 Thread Andreas Ericsson
On 11/16/2010 09:59 PM, Chris Beattie wrote:
 I noticed something curious.  It looks like Nagios 3.2.3 is making
 on-demand host checks faster than the retry_interval should allow.  The
 interval_length is set to 60 and the retry_interval is set to 1.  Nagios
 and the plugins were compiled from source on CentOS 5.5 x64.
 

Very curious indeed. The only thing I can see that might trigger something
like this is the following patch:

http://git.op5.org/git/?p=nagios.git;a=commitdiff;h=1149d275011d7c4d8631b44dbba30ebdb4d7e83f

That one was in 3.2.2 too though. Could you try un-commenting the lines
mentioned there and see if that helps? I won't revert that patch, but it
would give me a pretty good idea of where to start the bug-hunt.

Thanks.


-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null