Matuskiewicz, Philip wrote:
> Hi Markus,
>
> I agree that it is the job of the plugin to determine the state of the 
> service, but only when it can return a definitive result.  The use case here 
> is very unique to the MTA.
>
> My plugin doesn't return 255 (its set to return -1), Icinga automatically 
> sets it to 255 when it terminates the plugin after a timeout period (for a 
> reason that I haven't determined yet due to lack of log data).
>
> The problem is, when the plugin can't determine the current state, it tries 
> querying Icinga's status file (through MKLiveStatus for the Last State), and 
> if the monitoring server is under high load, this query times out also.  If 
> this fails, Icinga kills the plugin, the state of the service is set to 
> whatever the timeout state is in the configuration file (I set it to 
> Unknown).  The core ONLY lets you set the state to one of the 4 options (OK, 
> Warning, Critical, or Unknown), there is no way to not change the previous 
> state.

even if the core set an alarm signal (the infamous service check timeout 
warning) you can set such a signal in your plugin as well, applying your 
own timeout, and setting the previous state afterwards.

e.g. perl-ish

# Setting timeout
$SIG{ALRM} = sub {
     print "$NAME timed out after $opt{timeout} seconds\n";
     exit $UNKNOWN;
};
alarm $opt{timeout};

so it's really up to you and your plugin to fix the overall behaviour 
until the core pulls the alarm trigger. still, it does not hurt to 
increase the service check timeout to 120sec in various environments, 
letting the underlaying check plugins work a bit more "efficient" in a 
bigger time window.

>
> In the MTA's use case, If a bus was previously marked as up or down, its 
> state will become Unknown (as configured now), and send a round of 
> notification emails due to the state change (and for 6,000 buses, this 
> equates to 12,000 emails each for up and down since RDS tends to have 
> problems for hours at a time).  Furthermore, any previous acknowledgments are 
> erased and we lost all of our tracking.
>
> As for your suggestion about the $LASTSERVICESTATE$ macro, I'll attempt that 
> route, but I'm still concerned that CURL might cause this 255 error to occur 
> because the script timed out before it could return a status to Icinga.

btw - a plugin returning -1 is not a valid exit code, and therefore 
treated as you described ("out of bounds").
http://docs.icinga.org/latest/en/pluginapi.html

kind regards,
Michael


-- 
DI (FH) Michael Friedrich

Vienna University Computer Center
Universitaetsstrasse 7 A-1010 Vienna, Austria

email:  michael.friedr...@univie.ac.at
phone:  +43 1 4277 14359
mobile: +43 664 60277 14359
fax:    +43 1 4277 14338
web:    http://www.univie.ac.at/zid
         http://www.aco.net

Lead Icinga Core Developer
http://www.icinga.org


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
icinga-users mailing list
icinga-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/icinga-users

Reply via email to