[Nagios-users] Some alerts not getting to sendmail
Good morning, or whatever as the case may be... I have a Nagios 3.2.1install which is showing a problem I'm unsure how to troubleshoot further. It's either something simple I'm missing, or a deeper, more difficult problem. Or a transient to be perhaps put on a shelf until it happens again. First, the questions: - Is the notifications log absolute? - Meaning, if a notification is shown in this log, it has passed all filters (notification options etc) and Nagios believes it was submitted to the MTA. - Is there anywhere besides the MTA's log,status.dat and nagios.log to look for clues to mail problems? == Details - Running on FreeBSD 7.0, using stock sendmail on localhost. - In general, everything is working fine. 125 hosts, 1600 ish services. This system has been up and stable for a few months. Host and service notifications of all kinds go out properly all the time. Last night, I had a host go down. Notification got to my cell phone and the other contacts it's configured to just fine. This morning, I dealt with the problem host and Nagios showed it back up. But no Host up notification to any of the configured contacts. The Notifications log shows the host up notifications as having been sent. There's nothing in /var/log/maillog for the time Nagios says the notifications were sent. In status.dat, the record for my cell contact has a last_host_notification line with the epoch time version of the exact second the notification was in theory sent. Host and template records included at the bottom of this email. I've included one contact def, but there were 4 contacts, using 2 different scripts that should have received the notification. As far as I can see, there is nothing in the host configuration or related templates that would keep a host up notification from being sent. We use custom host-notify scripts which log actions, and again, no entries for the specific problem, but lots of other notifications before and after. These scripts could be the problem, but I want to rule out other issues first. Thank you for your time, Tim Palmer === Host config: define host{ host_name host.foo.bar.tld use dslam alias Anytown DSLAM address xxx.xxx.xxx.xxx parents another.foo.bar.tld } define host{ namedslam use generic-host check_period24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check_dslam_uptime_snmp notification_period 24x7 notification_interval 0 notification_optionsd,u,r contact_groups contact1, contact2 register0 } define host{ namegeneric-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 0 retain_status_information 1 retain_nonstatus_information1 notification_period 24x7 register0 } Contact: define contact{ contact_nameme_text use text-contact alias me_text email npanxx_lf...@txt.smx.gateway } define contact{ nametext-contact use generic-contact service_notification_optionsc,r,w service_notification_commands notify-by-textmessage-service host_notification_commands notify-by-textmessage-host register0 } define contact{ namegeneric-contact service_notification_period 24x7 host_notification_period24x7 service_notification_optionsc,r host_notification_options d,r service_notification_commands notify-by-email-service host_notification_commands notify-by-email-host register0 } -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list
Re: [Nagios-users] Some alerts not getting to sendmail
On 11/18/2010 03:48 PM, Tim Palmer wrote: Good morning, or whatever as the case may be... I have a Nagios 3.2.1install which is showing a problem I'm unsure how to troubleshoot further. It's either something simple I'm missing, or a deeper, more difficult problem. Or a transient to be perhaps put on a shelf until it happens again. First, the questions: - Is the notifications log absolute? - Meaning, if a notification is shown in this log, it has passed all filters (notification options etc) and Nagios believes it was submitted to the MTA. Yes. - Is there anywhere besides the MTA's log,status.dat and nagios.log to look for clues to mail problems? The receiving end comes to mind, or any server(s) in between. == Details - Running on FreeBSD 7.0, using stock sendmail on localhost. - In general, everything is working fine. 125 hosts, 1600 ish services. This system has been up and stable for a few months. Host and service notifications of all kinds go out properly all the time. Last night, I had a host go down. Notification got to my cell phone and the other contacts it's configured to just fine. This morning, I dealt with the problem host and Nagios showed it back up. But no Host up notification to any of the configured contacts. The Notifications log shows the host up notifications as having been sent. There's nothing in /var/log/maillog for the time Nagios says the notifications were sent. In status.dat, the record for my cell contact has a last_host_notification line with the epoch time version of the exact second the notification was in theory sent. Host and template records included at the bottom of this email. I've included one contact def, but there were 4 contacts, using 2 different scripts that should have received the notification. As far as I can see, there is nothing in the host configuration or related templates that would keep a host up notification from being sent. We use custom host-notify scripts which log actions, and again, no entries for the specific problem, but lots of other notifications before and after. These scripts could be the problem, but I want to rule out other issues first. Notifications are a pretty integral part to what makes Nagios worth anything at all. Since you're using homebrewed scripts and noone else has reported any problems with them, I suggest you first debug your own scripts, or enable debug-logging for notifications. The dosc will tell you how to do that. It won't help for this occurrance of the failed notifications, but it will definitely help you in the future if it ever happens again. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Some alerts not getting to sendmail
Andreas Ericsson wrote: On 11/18/2010 03:48 PM, Tim Palmer wrote: Good morning, or whatever as the case may be... I have a Nagios 3.2.1install which is showing a problem I'm unsure how to troubleshoot further. It's either something simple I'm missing, or a deeper, more difficult problem. Or a transient to be perhaps put on a shelf until it happens again. First, the questions: - Is the notifications log absolute? - Meaning, if a notification is shown in this log, it has passed all filters (notification options etc) and Nagios believes it was submitted to the MTA. Yes. Excellent, thank you. That's the critical bit for me regarding Nagios. - Is there anywhere besides the MTA's log,status.dat and nagios.log to look for clues to mail problems? The receiving end comes to mind, or any server(s) in between. == Details - Running on FreeBSD 7.0, using stock sendmail on localhost. - In general, everything is working fine. 125 hosts, 1600 ish services. This system has been up and stable for a few months. Host and service notifications of all kinds go out properly all the time. Last night, I had a host go down. Notification got to my cell phone and the other contacts it's configured to just fine. This morning, I dealt with the problem host and Nagios showed it back up. But no Host up notification to any of the configured contacts. The Notifications log shows the host up notifications as having been sent. There's nothing in /var/log/maillog for the time Nagios says the notifications were sent. In status.dat, the record for my cell contact has a last_host_notification line with the epoch time version of the exact second the notification was in theory sent. Host and template records included at the bottom of this email. I've included one contact def, but there were 4 contacts, using 2 different scripts that should have received the notification. As far as I can see, there is nothing in the host configuration or related templates that would keep a host up notification from being sent. We use custom host-notify scripts which log actions, and again, no entries for the specific problem, but lots of other notifications before and after. These scripts could be the problem, but I want to rule out other issues first. Notifications are a pretty integral part to what makes Nagios worth anything at all. Since you're using homebrewed scripts and noone else has reported any problems with them, I suggest you first debug your own scripts, or enable debug-logging for notifications. The dosc will tell you how to do that. It won't help for this occurrance of the failed notifications, but it will definitely help you in the future if it ever happens again. Agreed on all counts. Now that you've confirmed the final-ness of the notifications log, I am comfortable looking outside Nagios to the scripts, system and sendmail. I'm sure there's a reasonable, logical explanation for a small subset of mail not getting from Nagios to the local MTA... Thank you Tim -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Some alerts not getting to sendmail
Tim Palmer wrote: Andreas Ericsson wrote: On 11/18/2010 03:48 PM, Tim Palmer wrote: Good morning, or whatever as the case may be... I have a Nagios 3.2.1install which is showing a problem I'm unsure how to troubleshoot further. It's either something simple I'm missing, or a deeper, more difficult problem. Or a transient to be perhaps put on a shelf until it happens again. First, the questions: - Is the notifications log absolute? - Meaning, if a notification is shown in this log, it has passed all filters (notification options etc) and Nagios believes it was submitted to the MTA. Yes. Excellent, thank you. That's the critical bit for me regarding Nagios. - Is there anywhere besides the MTA's log,status.dat and nagios.log to look for clues to mail problems? The receiving end comes to mind, or any server(s) in between. == Details - Running on FreeBSD 7.0, using stock sendmail on localhost. - In general, everything is working fine. 125 hosts, 1600 ish services. This system has been up and stable for a few months. Host and service notifications of all kinds go out properly all the time. Last night, I had a host go down. Notification got to my cell phone and the other contacts it's configured to just fine. This morning, I dealt with the problem host and Nagios showed it back up. But no Host up notification to any of the configured contacts. The Notifications log shows the host up notifications as having been sent. There's nothing in /var/log/maillog for the time Nagios says the notifications were sent. In status.dat, the record for my cell contact has a last_host_notification line with the epoch time version of the exact second the notification was in theory sent. Host and template records included at the bottom of this email. I've included one contact def, but there were 4 contacts, using 2 different scripts that should have received the notification. As far as I can see, there is nothing in the host configuration or related templates that would keep a host up notification from being sent. We use custom host-notify scripts which log actions, and again, no entries for the specific problem, but lots of other notifications before and after. These scripts could be the problem, but I want to rule out other issues first. Notifications are a pretty integral part to what makes Nagios worth anything at all. Since you're using homebrewed scripts and noone else has reported any problems with them, I suggest you first debug your own scripts, or enable debug-logging for notifications. The dosc will tell you how to do that. It won't help for this occurrance of the failed notifications, but it will definitely help you in the future if it ever happens again. Agreed on all counts. Now that you've confirmed the final-ness of the notifications log, I am comfortable looking outside Nagios to the scripts, system and sendmail. I'm sure there's a reasonable, logical explanation for a small subset of mail not getting from Nagios to the local MTA... Thank you Tim Note to self, and whoever else might be listening - properly quoting plugin output before releasing into the shell is a Good Thing. Blaming ex-employees for the oversight is tempting, but cowardly. And Tim's Trouble Shooting Rule #1 holds again - It's your fault, find what you did wrong or, Never forget you're an idiot. Tim -- Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null