Re: [icinga-users] my issues about notifcations

Michael Friedrich Sat, 01 Nov 2014 14:15:54 -0700

Am 01.11.2014 um 21:32 schrieb Darko Hojnik:

Hi there,


Since some days ago I am trying to configure notifications properly on our 
demands. Without success. I write this first for help but as an feedback also. 
Not about for bullshit storming and another kind of stupid flamewars. So I am 
asking me now are these maybe bugs, missing features, an misconfiguration? 
Should I give up? It’s Icinga2 stable and good enough for my company?

The first task is thats on 97% of our Infrastructure it’s necessary that people 
gets notified only if an service is going in an critical state and gets 
notified again even the event has been acknowledged or has gone back to the 
state OK. That sounds like an simple task but it is not.

I do understand that it's sometimes frustrating when something does notwork, and noone is there to take a look or help out.

But you should also consider what others might be doing, or why there isno simple answer.

I for myself am in the middle of writing Icinga 2 documentation (yes,right now too, just because we're behind our time schedule for the 2.2release), planning 1.12 release, and organizing a talk for OSMC in 2,5weeks. Other Icinga developers might just have enough to do on their own.

And your problem doesn't sound simple. If I got that correctly, you'reusing an 8 node cluster (did you see my reply to your thread some hoursago? there was a question there too).

When a problem is not simple, it will take everyone _time_. Time toread, understand, reproduce, writing an answer you're enjoying, orgetting an idea from.

Still, it's open source software, and many of us do it for fun, when notat work (still, at netways it's also fun at work, but that's different).Nagging us down or bumping questions won't help much - unless it soundsmission critical, or one finds it challenging to stumble into theproblem - like the 1.x core reload problem recently.


Nevertheless, you should consider that answers may last over here.

If you think you've found a bug, then please do report one. Thedevelopers (not only me) appreciate any feedback and possible issues.

Still, the same as before applies - if noone looks into it or doesn'tcomment on it, it's a matter of todos and time. If you want to sponsordevelopment time for resolving bugs - sure, just do it.


We are running FreeBSD 10 and Icinga 2.1.1 on our nodes. Our servers are almost 
running with FreeBSD.

Some I hope helpful informations.

my template,

template Notification "mail-service-notification" {
command = "mail-service-notification"

states = [ OK, Warning, Critical, Unknown, ]

types = [ Problem, Acknowledgement, Recovery, Custom,
FlappingStart, FlappingEnd,
DowntimeStart, DowntimeEnd, DowntimeRemoved ]

period = "24x7"
}

my commands,

object NotificationCommand "mail-host-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/mail-host-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     HOSTSTATE = "$host.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     HOSTOUTPUT = "$host.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     USEREMAIL = "$user.email$"
   }
}

object NotificationCommand "sms-host-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/sms-host-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     HOSTSTATE = "$host.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SHORTDATETIME = "$icinga.short_date_time$"
     HOSTOUTPUT = "$host.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     PAGER = "$user.pager$"
   }
}

object NotificationCommand "mail-service-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/mail-service-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     SERVICEDESC = "$service.name$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     SERVICESTATE = "$service.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SERVICEOUTPUT = "$service.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     SERVICEDISPLAYNAME = "$service.display_name$"
     USEREMAIL = "$user.email$"
   }
}

object NotificationCommand "sms-service-notification" {
   import "plugin-notification-command"

   command = [ SysconfDir + "/icinga2/scripts/sms-service-notification.sh" ]

   env = {
     NOTIFICATIONTYPE = "$notification.type$"
     SERVICEDESC = "$service.name$"
     HOSTALIAS = "$host.display_name$"
     HOSTADDRESS = "$address$"
     SERVICESTATE = "$service.state$"
     LONGDATETIME = "$icinga.long_date_time$"
     SHORTDATETIME = "$icinga.short_date_time$"
     SERVICEOUTPUT = "$service.output$"
     NOTIFICATIONAUTHORNAME = "$notification.author$"
     NOTIFICATIONCOMMENT = "$notification.comment$"
     HOSTDISPLAYNAME = "$host.display_name$"
     SERVICEDISPLAYNAME = "$service.display_name$"
     PAGER = "$user.pager$"
   }
}

my inherited notifications

apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical, OK ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical, OK ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

So like described above notifications should been sent in stadiums critical and 
recovery. But they has been send also notifications about recovery if the state 
has been warning.

I would remove times.begin here, that doesn't make much sense regardingyour example, and will certainly interfere with testing whennotifications are sent.


Other than that:

Your assumption of when a recovery occurs, is wrong. There never was anydifference between warning, critical and unknown - they are assumed tobe NOT-OK, and will trigger a hard state change. No matter which statechange increased that check attempt counter before.


So basically, a notification can even happen like this:

OK (soft) -> WARNING (1) -> CRITICAL (2) -> WARNING (3) ---> hard statechange

But your users (contacts in Icinga 1/Nagios) wouldn't be notified, justbecause their filters don't allow Warning conditions.

In a similar fashion, a recovery is only when returning from NOT-OK toOK. Icinga 1.x and 2.x don't store information on who was notified bywhich state/type filter, and will take that into account upon servicerecovery and who to finally send a notification to.

Feel free to put or sponsor a feature request, but I don't think thatthis will be an easy task to implement.

Although you're not alone, that topic has been discussed for many yearsknow, but noone fixed it (only external addons such as NoMa do, withtheir own filter capabilities).

So basically Icinga 1.x works like Icinga 2.x in this regard, but mostlikely you don't know Icinga 1.x or Nagios and haven't encountered thisfeature, or shortcoming. That's a per user opinion.


apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

described above then notifications was send with the state critical but no one 
notification with the state recovery

in conclusion this means that people get services messages with state OK even 
if they have previously seen no messages getting ahead with the status warning. 
And that’s confusing people and reduces the acceptance about icinga2.

How I could reach it, that notifications for messages should be sent in 
stadiums critical and gets an recovery message if the issue doesn't exist 
anymore. That’s one of the things that I highly want


The second issue is that notifications doesn’t have been send in the right time.


apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 120m
   times.begin = 3m
}

The first notification has come later then after 30 minutes and the 
re-notification has been send also very late in my test-environment. And 
sometimes nothing has been happen.

[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!mail-service-standard': before escalation 
range
[2014-10-30 19:50:18 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!sms-standard': before escalation range

[2014-10-30 19:55:15 +0000] information/NotificationComponent: Sending reminder 
notification for object 'dev3!Raid Status'
[2014-10-30 19:55:15 +0000] notice/Notification: Not sending notifications for 
notification object 'dev3!Raid Status!sms-standard': before escalation range

Could you explain a bit more, how these lines correlate to yourobservations in your test lab? The only thing I do see here is thattimes.begin skips the initial notification. It does not delay it though.



So my cruel crappy hack is

apply Notification "mail-service-standard-first-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 0
   times.begin = 3m
}

apply Notification "mail-service-standard-re-notification" to Service {
   import "mail-service-notification"

   user_groups = [ "trivago-admins-email" ]
   assign where service.vars.sla == "24x7"
   states = [ Critical ]
   types = [ Problem, Acknowledgement, Recovery, Custom,
             DowntimeStart, DowntimeEnd, DowntimeRemoved ]

   period = "24x7"
   interval = 10m
   times.begin = 3m
}

that forces to send an notification directly and every 10 minutes after. Is 
this the really way to configure notifications at the right time?

As said, I would remove the escalation range (times.begin), keep in mindthat not-ok states will always generate recovery noticiations, and thentest again.


Kind regards,
Michael



thank you all for any help, hints or suggestions.


cheers

Darko Hojnik
Datacenter Operations

[email protected]
www.trivago.com


Court of Registration: Amtsgericht Duesseldorf, Registration Number: HRB 51842
Managing directors: Rolf Schroemgens • Malte Siewert • Peter Vinnemeier
trivago GmbH • Bennigsen-Platz 1 • 40474 Duesseldorf, Germany
* This email message may contain legally privileged and/or confidential 
information.
You are hereby notified that any disclosure, copying, distribution, or use of 
this email message is strictly prohibited.

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users



--
DI (FH) Michael Friedrich

[email protected]  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
[email protected]       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Re: [icinga-users] my issues about notifcations

Reply via email to