[Nagios-users] Some alerts not getting to sendmail

2010-11-18 Thread Tim Palmer
Good morning, or whatever as the case may be...

I have a Nagios 3.2.1install which is showing a problem I'm unsure how 
to troubleshoot further. It's either something simple I'm missing, or a 
deeper, more difficult problem. Or a transient to be perhaps put on a 
shelf until it happens again.

First, the questions:
- Is the notifications log absolute?
- Meaning, if a notification is shown in this log, it has passed all 
filters (notification options etc) and Nagios believes it was submitted 
to the MTA.

- Is there anywhere besides the MTA's log,status.dat and nagios.log to 
look for clues to mail problems?
==
Details
- Running on FreeBSD 7.0, using stock sendmail on localhost.
- In general, everything is working fine. 125 hosts, 1600 ish services. 
This system has been up and stable for a few months.

Host and service notifications of all kinds go out properly all the time.

Last night, I had a host go down. Notification got to my cell phone and 
the other contacts it's configured to just fine. This morning, I dealt 
with the problem host and Nagios showed it back up. But no Host up 
notification to any of the configured contacts. The Notifications log 
shows the host up notifications as having been sent. There's nothing in 
/var/log/maillog for the time Nagios says the notifications were sent. 
In status.dat, the record for my cell contact has a 
last_host_notification line with the epoch time version of the exact 
second the notification was in theory sent. Host and template records 
included at the bottom of this email. I've included one contact def, but 
there were 4 contacts, using 2 different scripts that should have 
received the notification.

As far as I can see, there is nothing in the host configuration or 
related templates that would keep a host up notification from being sent.

We use custom host-notify scripts which log actions, and again, no 
entries for the specific problem, but lots of other notifications before 
and after. These scripts could be the problem, but I want to rule out 
other issues first.

Thank you for your time,

Tim Palmer

===
Host config:
define host{
host_name   host.foo.bar.tld
use dslam
alias   Anytown DSLAM
address xxx.xxx.xxx.xxx
parents another.foo.bar.tld
}

define host{
namedslam  
use generic-host   
check_period24x7   
check_interval  5  
retry_interval  1  
max_check_attempts  10 
check_command   check_dslam_uptime_snmp
notification_period 24x7  
notification_interval   0  
notification_optionsd,u,r  
contact_groups  contact1, contact2
register0  
}

define host{
namegeneric-host
notifications_enabled   1
event_handler_enabled   1 
flap_detection_enabled  1  
failure_prediction_enabled  1
process_perf_data   0 
retain_status_information   1
retain_nonstatus_information1 
notification_period 24x7
register0  
}


Contact:
define contact{
contact_nameme_text
use text-contact
alias   me_text
email   npanxx_lf...@txt.smx.gateway
}
define contact{
nametext-contact
use generic-contact
service_notification_optionsc,r,w
service_notification_commands   notify-by-textmessage-service
host_notification_commands  notify-by-textmessage-host
register0
}
define contact{
namegeneric-contact
service_notification_period 24x7
host_notification_period24x7
service_notification_optionsc,r
host_notification_options   d,r
service_notification_commands   notify-by-email-service
host_notification_commands  notify-by-email-host
register0
}

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list

Re: [Nagios-users] Some alerts not getting to sendmail

2010-11-18 Thread Andreas Ericsson
On 11/18/2010 03:48 PM, Tim Palmer wrote:
 Good morning, or whatever as the case may be...
 
 I have a Nagios 3.2.1install which is showing a problem I'm unsure how
 to troubleshoot further. It's either something simple I'm missing, or a
 deeper, more difficult problem. Or a transient to be perhaps put on a
 shelf until it happens again.
 
 First, the questions:
 - Is the notifications log absolute?
  - Meaning, if a notification is shown in this log, it has passed all
 filters (notification options etc) and Nagios believes it was submitted
 to the MTA.
 

Yes.

 - Is there anywhere besides the MTA's log,status.dat and nagios.log to
 look for clues to mail problems?

The receiving end comes to mind, or any server(s) in between.

 ==
 Details
 - Running on FreeBSD 7.0, using stock sendmail on localhost.
 - In general, everything is working fine. 125 hosts, 1600 ish services.
 This system has been up and stable for a few months.
 
 Host and service notifications of all kinds go out properly all the time.
 
 Last night, I had a host go down. Notification got to my cell phone and
 the other contacts it's configured to just fine. This morning, I dealt
 with the problem host and Nagios showed it back up. But no Host up
 notification to any of the configured contacts. The Notifications log
 shows the host up notifications as having been sent. There's nothing in
 /var/log/maillog for the time Nagios says the notifications were sent.
 In status.dat, the record for my cell contact has a
 last_host_notification line with the epoch time version of the exact
 second the notification was in theory sent. Host and template records
 included at the bottom of this email. I've included one contact def, but
 there were 4 contacts, using 2 different scripts that should have
 received the notification.
 
 As far as I can see, there is nothing in the host configuration or
 related templates that would keep a host up notification from being sent.
 
 We use custom host-notify scripts which log actions, and again, no
 entries for the specific problem, but lots of other notifications before
 and after. These scripts could be the problem, but I want to rule out
 other issues first.
 

Notifications are a pretty integral part to what makes Nagios worth
anything at all. Since you're using homebrewed scripts and noone else
has reported any problems with them, I suggest you first debug your
own scripts, or enable debug-logging for notifications. The dosc will
tell you how to do that. It won't help for this occurrance of the
failed notifications, but it will definitely help you in the future
if it ever happens again.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Some alerts not getting to sendmail

2010-11-18 Thread Tim Palmer


Andreas Ericsson wrote:
 On 11/18/2010 03:48 PM, Tim Palmer wrote:
   
 Good morning, or whatever as the case may be...

 I have a Nagios 3.2.1install which is showing a problem I'm unsure how
 to troubleshoot further. It's either something simple I'm missing, or a
 deeper, more difficult problem. Or a transient to be perhaps put on a
 shelf until it happens again.

 First, the questions:
 - Is the notifications log absolute?
  - Meaning, if a notification is shown in this log, it has passed all
 filters (notification options etc) and Nagios believes it was submitted
 to the MTA.

 

 Yes.
   

Excellent, thank you. That's the critical bit for me regarding Nagios.

   
 - Is there anywhere besides the MTA's log,status.dat and nagios.log to
 look for clues to mail problems?
 

 The receiving end comes to mind, or any server(s) in between.

   
 ==
 Details
 - Running on FreeBSD 7.0, using stock sendmail on localhost.
 - In general, everything is working fine. 125 hosts, 1600 ish services.
 This system has been up and stable for a few months.

 Host and service notifications of all kinds go out properly all the time.

 Last night, I had a host go down. Notification got to my cell phone and
 the other contacts it's configured to just fine. This morning, I dealt
 with the problem host and Nagios showed it back up. But no Host up
 notification to any of the configured contacts. The Notifications log
 shows the host up notifications as having been sent. There's nothing in
 /var/log/maillog for the time Nagios says the notifications were sent.
 In status.dat, the record for my cell contact has a
 last_host_notification line with the epoch time version of the exact
 second the notification was in theory sent. Host and template records
 included at the bottom of this email. I've included one contact def, but
 there were 4 contacts, using 2 different scripts that should have
 received the notification.

 As far as I can see, there is nothing in the host configuration or
 related templates that would keep a host up notification from being sent.

 We use custom host-notify scripts which log actions, and again, no
 entries for the specific problem, but lots of other notifications before
 and after. These scripts could be the problem, but I want to rule out
 other issues first.

 

 Notifications are a pretty integral part to what makes Nagios worth
 anything at all. Since you're using homebrewed scripts and noone else
 has reported any problems with them, I suggest you first debug your
 own scripts, or enable debug-logging for notifications. The dosc will
 tell you how to do that. It won't help for this occurrance of the
 failed notifications, but it will definitely help you in the future
 if it ever happens again.

   
Agreed on all counts. Now that you've confirmed the final-ness of the 
notifications log, I am comfortable looking outside Nagios to the 
scripts, system and sendmail. I'm sure there's a reasonable, logical 
explanation for a small subset of mail not getting from Nagios to the 
local MTA...

Thank you

Tim


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Some alerts not getting to sendmail

2010-11-18 Thread Tim Palmer


Tim Palmer wrote:
 Andreas Ericsson wrote:
   
 On 11/18/2010 03:48 PM, Tim Palmer wrote:
   
 
 Good morning, or whatever as the case may be...

 I have a Nagios 3.2.1install which is showing a problem I'm unsure how
 to troubleshoot further. It's either something simple I'm missing, or a
 deeper, more difficult problem. Or a transient to be perhaps put on a
 shelf until it happens again.

 First, the questions:
 - Is the notifications log absolute?
  - Meaning, if a notification is shown in this log, it has passed all
 filters (notification options etc) and Nagios believes it was submitted
 to the MTA.

 
   
 Yes.
   
 

 Excellent, thank you. That's the critical bit for me regarding Nagios.

   
   
 
 - Is there anywhere besides the MTA's log,status.dat and nagios.log to
 look for clues to mail problems?
 
   
 The receiving end comes to mind, or any server(s) in between.

   
 
 ==
 Details
 - Running on FreeBSD 7.0, using stock sendmail on localhost.
 - In general, everything is working fine. 125 hosts, 1600 ish services.
 This system has been up and stable for a few months.

 Host and service notifications of all kinds go out properly all the time.

 Last night, I had a host go down. Notification got to my cell phone and
 the other contacts it's configured to just fine. This morning, I dealt
 with the problem host and Nagios showed it back up. But no Host up
 notification to any of the configured contacts. The Notifications log
 shows the host up notifications as having been sent. There's nothing in
 /var/log/maillog for the time Nagios says the notifications were sent.
 In status.dat, the record for my cell contact has a
 last_host_notification line with the epoch time version of the exact
 second the notification was in theory sent. Host and template records
 included at the bottom of this email. I've included one contact def, but
 there were 4 contacts, using 2 different scripts that should have
 received the notification.

 As far as I can see, there is nothing in the host configuration or
 related templates that would keep a host up notification from being sent.

 We use custom host-notify scripts which log actions, and again, no
 entries for the specific problem, but lots of other notifications before
 and after. These scripts could be the problem, but I want to rule out
 other issues first.

 
   
 Notifications are a pretty integral part to what makes Nagios worth
 anything at all. Since you're using homebrewed scripts and noone else
 has reported any problems with them, I suggest you first debug your
 own scripts, or enable debug-logging for notifications. The dosc will
 tell you how to do that. It won't help for this occurrance of the
 failed notifications, but it will definitely help you in the future
 if it ever happens again.

   
 
 Agreed on all counts. Now that you've confirmed the final-ness of the 
 notifications log, I am comfortable looking outside Nagios to the 
 scripts, system and sendmail. I'm sure there's a reasonable, logical 
 explanation for a small subset of mail not getting from Nagios to the 
 local MTA...

 Thank you

 Tim
   

Note to self, and whoever else might be listening - properly quoting 
plugin output before releasing into the shell is a Good Thing. Blaming 
ex-employees for the oversight is tempting, but cowardly.

And Tim's Trouble Shooting Rule #1 holds again - It's your fault, find 
what you did wrong or, Never forget you're an idiot.

Tim



--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null