On 19.09.2013 16:29, Viranch Mehta wrote: > Hi, > > I have been experiencing this weird problem with Icinga. At some random times, > Icinga decides to not send me recovery alerts when service recovers after > being in problem state. > > Following is what I see in logs (filter with SERVICE ALERT and SERVICE > NOTIFICATION entries): > > [1379043382] SERVICE ALERT: host;service;UNKNOWN;SOFT;1;(Timed Out) > [1379043482] SERVICE ALERT: host;service;UNKNOWN;SOFT;2;(Timed Out) > [1379043582] SERVICE ALERT: host;service;UNKNOWN;HARD;3;(Timed Out) > [1379043582] SERVICE NOTIFICATION: admin;host;service;UNKNOWN;notify-service- > by-email;(Timed Out) > [1379046062] SERVICE ALERT: host;service;OK;HARD;1;OK service ok > > (host & service masked) > > I don't understand how it directly went from UNKNOWN;HARD;3 to OK;HARD;1. I > have max_check_attempts set to 3. Shouldn't it be OK;SOFT;1 -> OK;SOFT;2 -> > OK;HARD;3 ?
that's a hard state recovery, as explained in http://docs.icinga.org/latest/en/statetypes.html#hardstates the actual hard state change already happened with UNKNOWN;HARD;3 triggering your max check attempts and also the notification. > > Also, there is only one notification after UNKNOWN;HARD;3, and none after it > recovered at OK;HARD;1. I have w,u,c,r,f in notification_options of both > contact and service check, and 24x7 timeperiods. i'd like to see those object definitions from your objects.cache file. > My SMTP server is also > working fine as I am getting tons of alerts everyday. Problem is Icinga did > not > call notify-service-by-email command at all (so no question of SMTP itself). > > This is also happening for various other checks, following is another example: > > [1379043572] SERVICE ALERT: host;service2;UNKNOWN;SOFT;1;(Timed Out) > [1379043662] SERVICE ALERT: host;service2;UNKNOWN;SOFT;2;(Timed Out) > [1379043762] SERVICE ALERT: host;service2;UNKNOWN;HARD;3;(Timed Out) > [1379043762] SERVICE NOTIFICATION: admin;host;service2;UNKNOWN;notify-service- > by-email;(Service Check Timed Out) > [1379046062] SERVICE ALERT: host;service2;CRITICAL;SOFT;1;Return code of 255 > is out of bounds > [1379046162] SERVICE ALERT: host;service2;OK;SOFT;2;OK: service ok > > Again no OK;HARD;3 state after OK;SOFT;2. and no notification after OK;SOFT;2 > either. notifications are not triggered on soft state changes, but hard state changes. eventhandlers would be executed on *every* state type change. http://docs.icinga.org/latest/en/statetypes.html#softstates you'll also see that the CRITICAL was in a SOFT state before (therefore not having reached max_check_attempts) and that's why the service state change is a soft state recovery to OK not triggering any notifications. further, the log entry looks cut off between 1379043762 and 1379046062 - 2300 seconds, and at least one hard state recovery is missing after the hard unknown. > > I have been struggling with missing recovery alerts since some time now. After > this I'm not even sure if I'm getting alerts for every problem either. It > could as well miss a problem alert if it is missing recovery alerts. i'd say you did misunderstand hard and soft state changes and their triggered actions. furthermore i do believe that maybe max_check_attempts or the check/retry interval was not chosen based on what you expect. debug logs on notifications likely unveil some notification options/period filtering too you normally do not see. and you didn't tell anything about the versions involved nor the method for installation (source, packages) nor the distribution. -- DI (FH) Michael Friedrich mail: michael.friedr...@gmail.com twitter: https://twitter.com/dnsmichi jabber: dnsmi...@jabber.ccc.de irc: irc.freenode.net/icinga dnsmichi icinga open source monitoring position: lead core developer url: https://www.icinga.org ------------------------------------------------------------------------------ Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk _______________________________________________ icinga-users mailing list icinga-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/icinga-users