Re: [Nagios-users] Notification after Acknowledgment
On 13 May 2011 08:34, Andre Kruger andre.kru...@trw.com wrote: Hi Can you guys please give me your input on how you handle the following situation. Lets take monitoring a disk as an example. For arguments sake lets say when the disk reaches 80% capacity I send out a warning and at 90% I send out a critical. There is also a Service Escalation configured to send out notifications when this service reaches critical. So at 80 percent I get my notification all is well. I then go ahead and acknowledge the event and in doing so Nagios will not send out any further notifications. Which according to the Nagios logic is correct. The problem is if the disk in the mean time reaches critical, 90% capacity, I won't get another notification. Not even the Service Escalation helps here, because the event has already been acknowledged. Do you guys have any suggestions on how this problem can be solved? Regards Andre The way I sometimes use for prolonged issues like this is I will acknowledge the alert, but then raise the warning and critical thresholds in Nagios. The problem with this approach is that Nagios then reports the status as OK which might give a false impression to other users. It is also important to remember to reduce the warning threshold back to its usual level once the issue is resolved. For issues which might be fast-moving I would suggest that it is not appropriate to acknowledge the issue unless you are in a postion actively to manage it until resolution. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification after Acknowledgment
Hi Thanks for that. I just read how non-sticky acknowledgments work from 3.2.3. I think this solves my problem. http://wiki.nagios.org/index.php/Acknowledgementlogic Assuming you have a service with notifications enabled for all states with a max retry attempts of 1, these are the notifications you should get based on the following transitions: service in OK service goes into WARNING - notification sent non-sticky acknowledgement applied service goes into CRITICAL. Acknowledgement removed. Notification sent non-sticky acknowledgement applied service goes into WARNING. Acknowledgement removed. Notification sent non-sticky acknowledgement applied service goes into CRITICAL. Acknowledgement removed. Notification sent service goes into OK. Recovery notification sent Jim Avery j...@jimavery.me.uk 2011/05/13 09:57 On 13 May 2011 08:34, Andre Kruger andre.kru...@trw.com wrote: Hi Can you guys please give me your input on how you handle the following situation. Lets take monitoring a disk as an example. For arguments sake lets say when the disk reaches 80% capacity I send out a warning and at 90% I send out a critical. There is also a Service Escalation configured to send out notifications when this service reaches critical. So at 80 percent I get my notification all is well. I then go ahead and acknowledge the event and in doing so Nagios will not send out any further notifications. Which according to the Nagios logic is correct. The problem is if the disk in the mean time reaches critical, 90% capacity, I won't get another notification. Not even the Service Escalation helps here, because the event has already been acknowledged. Do you guys have any suggestions on how this problem can be solved? Regards Andre The way I sometimes use for prolonged issues like this is I will acknowledge the alert, but then raise the warning and critical thresholds in Nagios. The problem with this approach is that Nagios then reports the status as OK which might give a false impression to other users. It is also important to remember to reduce the warning threshold back to its usual level once the issue is resolved. For issues which might be fast-moving I would suggest that it is not appropriate to acknowledge the issue unless you are in a postion actively to manage it until resolution. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Please consider your environmental responsibility before printing this e-mail or any other document. Ask yourself whether you need a hard copy. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification after Acknowledgment
what's the purpose of acknowledging the service problems? just to suppress the notifications or ? On Fri, May 13, 2011 at 3:34 PM, Andre Kruger andre.kru...@trw.com wrote: Hi Can you guys please give me your input on how you handle the following situation. Lets take monitoring a disk as an example. For arguments sake lets say when the disk reaches 80% capacity I send out a warning and at 90% I send out a critical. There is also a Service Escalation configured to send out notifications when this service reaches critical. So at 80 percent I get my notification all is well. I then go ahead and acknowledge the event and in doing so Nagios will not send out any further notifications. Which according to the Nagios logic is correct. The problem is if the disk in the mean time reaches critical, 90% capacity, I won't get another notification. Not even the Service Escalation helps here, because the event has already been acknowledged. Do you guys have any suggestions on how this problem can be solved? Regards Andre P Please consider your environmental responsibility before printing this e-mail or any other document. Ask yourself whether you need a hard copy. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification after Acknowledgment
Andre, I wouldn't acknowledge it unless you plan to actually do something about it. I use escalations which instigate callouts to engineers. When the oncall engineer acks an alert it means they are investigating. It would be pointless surely to ack something which you aren't going to do anything about. Also, think about why you'd ack at 80% if it's just a warning. We have thresholds of 85% for disk usage warnings but in all honesty it's there as exactly that - a warning. We don't send notifications for warnings on disk usage. We just monitor via the web interface. Notifications are sent on critical alerts because that is the time when action needs to be taken and users need to pay attention. But this is all based on our requirements rather than yours so this is just my tuppence worth! Regards, Deborah From: Andre Kruger [mailto:andre.kru...@trw.com] Sent: 13 May 2011 08:35 To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Notification after Acknowledgment Hi Can you guys please give me your input on how you handle the following situation. Lets take monitoring a disk as an example. For arguments sake lets say when the disk reaches 80% capacity I send out a warning and at 90% I send out a critical. There is also a Service Escalation configured to send out notifications when this service reaches critical. So at 80 percent I get my notification all is well. I then go ahead and acknowledge the event and in doing so Nagios will not send out any further notifications. Which according to the Nagios logic is correct. The problem is if the disk in the mean time reaches critical, 90% capacity, I won't get another notification. Not even the Service Escalation helps here, because the event has already been acknowledged. Do you guys have any suggestions on how this problem can be solved? Regards Andre P Please consider your environmental responsibility before printing this e-mail or any other document. Ask yourself whether you need a hard copy. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification after Acknowledgment
Yes, it was to stop the notifications, but I would then like to receive notifications again when the service that was acknowledged goes into a critical state. But non-sticky acknowledgments has solved this problem for me. I think I am going to change my default to non-sticky. Yueh-Hung Liu yuehung@gmail.com 2011/05/13 10:07 what's the purpose of acknowledging the service problems? just to suppress the notifications or ? On Fri, May 13, 2011 at 3:34 PM, Andre Kruger andre.kru...@trw.com wrote: Hi Can you guys please give me your input on how you handle the following situation. Lets take monitoring a disk as an example. For arguments sake lets say when the disk reaches 80% capacity I send out a warning and at 90% I send out a critical. There is also a Service Escalation configured to send out notifications when this service reaches critical. So at 80 percent I get my notification all is well. I then go ahead and acknowledge the event and in doing so Nagios will not send out any further notifications. Which according to the Nagios logic is correct. The problem is if the disk in the mean time reaches critical, 90% capacity, I won't get another notification. Not even the Service Escalation helps here, because the event has already been acknowledged. Do you guys have any suggestions on how this problem can be solved? Regards Andre P Please consider your environmental responsibility before printing this e-mail or any other document. Ask yourself whether you need a hard copy. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Please consider your environmental responsibility before printing this e-mail or any other document. Ask yourself whether you need a hard copy. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Notification after Acknowledgment
On 13 May 2011 09:01, Andre Kruger andre.kru...@trw.com wrote: I just read how non-sticky acknowledgments work from 3.2.3. I think this solves my problem. http://wiki.nagios.org/index.php/Acknowledgementlogic Neat! Thanks I hadn't noticed that. -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null