Hi All, We had an event several weeks ago, it has happened again, I'm posted after the first time and reposting to the same thread because they are related. Sorry if this makes the it confusing. Marc thanks for the reply to the first event, see below for my responses.
In the second event a host went down due to a power outage but only a portion of the contacts were sent notifications (which is the same problem as the first event). This is the second time, that I know of, that Nagios has failed to send to some of the contacts. The problem has now occurred on two different hosts. I can't explain why it's happening which does not instill confidence in our customers. Any help or suggestions in fixing this are greatly appreciated. This next part is from/for the first event -------------------- first event -------------------------------------- >> There was a routing issue on our WAN that caused this event, the SMTP server we use is across the WAN. Could the routing issue have prevented some of the SMTP notifications from being sent, wouldn't they just queue up and go once the problem was resolved? >They would be queued by the SMTP server running on your nagios machine. Redelivery attempts would occur based on the configuration there. Okay, makes sense. >> I have seen messages that did not arrive at the recipients phone but I've never seen Nagios not generate notifications for contacts that are configured for that host or service. Has anyone else seen this, any suggestions on a cause or how to troubleshoot? >- Check nagios.log for a HOST NOTIFICATION event for that group. Make sure there were no errors logged. nagios.log only shows notifications sent to some of the contacts, these notifications were received. >- Check your local SMTP server logs to see if the messages were received there and no errors were reported. Not necessary, nagios did not send the notifications >- Make sure that nagios has been restarted since adding this group and contacts. Done. The contact groups in question have been in place for many months. >- Make sure you don't have multiple nagios daemons running at the same time. Done. Only a single instance is running. ----------------------- end of first event --------------------------------- ------------------------ Second event with logs and configs ----------------- Below are the configs for the host from the second event. If you look the log at the bottom you'll see that 11 of 16 contacts were sent notifications, some but not all from each of the contact groups configured. I'm trying to figure out why. Does anyone see a problem with my configs? Host in question: CONFIGS: define host { host_name Host_A alias Host_A parents Host_B use upshost contact_groups +network-email,onguard register 1 } define contactgroup { contactgroup_name network-email alias Users who monitor the network - email only members netuser1,netuser2,netuser3 } define contactgroup { contactgroup_name onguard alias On Guard Admins members og_user1-phone,og_user2-phone,og_user3,og_user3-home,og_user3-phone,og_u ser4,og_user4-phone,og_user5-phone,og_user6,og_user6-phone,og_user7,og_u ser7-phone,og_user8 } define host { name upshost alias NetInfra UPS' template check_command check-host-alive use generic-pnp,generic-host max_check_attempts 5 check_interval 60 retry_interval 3 active_checks_enabled 1 passive_checks_enabled 1 flap_detection_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 contact_groups network notification_interval 60 notification_period 24x7 notification_options d,u,r notifications_enabled 1 register 0 } Excerpt from nagios.log [1283265540] HOST NOTIFICATION: netuser2-cell;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL - Packet loss = 100% [1283265540] HOST NOTIFICATION: netuser2-pager;Host_A;UNREACHABLE;alert-host-by-modem;PING CRITICAL - Packet loss = 100% [1283265540] HOST NOTIFICATION: netuser2;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283265540] HOST NOTIFICATION: og_user8;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283265540] HOST NOTIFICATION: og_user7-phone;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL - Packet loss = 100% [1283265540] HOST NOTIFICATION: og_user7;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283265541] HOST NOTIFICATION: og_user6-phone;Host_A;UNREACHABLE;alert-host-by-email-short;PING CRITICAL - Packet loss = 100% [1283265541] HOST NOTIFICATION: og_user6;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283265541] HOST NOTIFICATION: og_user4;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283265541] HOST NOTIFICATION: og_user3-home;Host_A;UNREACHABLE;alert-host-by-email-short;PING CRITICAL - Packet loss = 100% [1283265541] HOST NOTIFICATION: og_user3;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL - Packet loss = 100% [1283266180] HOST ALERT: Host_A;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266180] HOST NOTIFICATION: netuser2-cell;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266180] HOST NOTIFICATION: netuser2-pager;Host_A;UP;alert-host-by-modem;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266180] HOST NOTIFICATION: netuser2;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266180] HOST NOTIFICATION: og_user8;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266181] HOST NOTIFICATION: og_user7-phone;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266181] HOST NOTIFICATION: og_user7;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266181] HOST NOTIFICATION: og_user6-phone;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266181] HOST NOTIFICATION: og_user6;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266181] HOST NOTIFICATION: og_user4;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266182] HOST NOTIFICATION: og_user3-home;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss = 0%, RTA = 0.50 ms [1283266182] HOST NOTIFICATION: og_user3;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%, RTA = 0.50 ms --------------------- end of second event ------------------------------------------- ------------------------------------------------------------------------------ This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null