Re: [icinga-users] Icinga2 soft_state_dependencies

Barry Quiel Mon, 10 Nov 2014 11:54:12 -0800

Let me give you an example:

In Icinga1 -

A host which is the parent for 10 other hosts. It goes down. The 10devices behind should go unreachable not down. The problem we run intois the timing. In Icinga1, soft_state_dependencies allows the softstate of the parent host to be considered for the reachability of thechild hosts. In the case that the child hosts are checked morefrequently or have a fewer max_retries value they could go into a downhard state, and generate a notification, before the parent host goes toa hard down state. Then the parent device reaches a down hard state.All of those child hosts then go into the unreachable state, which wehave configured to not send notifications for. If the parent host isdown, I know everything behind it is going to be down. Let me quotefrom the Icinga1 docs:

"By default, Icinga will notify contacts about both DOWN and UNREACHABLEhost states. As an admin/tech, you might not want to get notificationsabout hosts that are UNREACHABLE. You know your network structure, andif Icinga notifies you that your router/firewall is down, you know thateverything behind it is unreachable.

If you want to spare yourself from a flood of UNREACHABLE notificationsduring network outages, you can exclude the unreachable (u) option fromthe notification_options directive in your host definitions and/or thehost_notification_options directive in your contact definitions."

Which is exactly what we do. But without soft_state_dependencies is allbecomes dependent on timing. The parent host has to get to a hard downstate before the child hosts go to an unreachable state. There is not alot of control over that timing. Checking the child hosts less often orusing a higher max_retries value isn't an option for us. Generallythose child hosts are access switches that don't have a high level ofredundancy, or a large quantity of single attached devices. We are moresensitive to the state of those devices.

It's really about controlling the quantity of alerts that are generated.Take the example above 1 parent and 10 child hosts. Withoutsoft_state_dependencies:If the child hosts reach a hard state before the parent 11 alerts aregenerated. 10 from the children and 1 from the parent.

With soft_state_dependencies:

Only 1 alert is generated, and its from the parent host when it reachesa hard state. The children have already gone to the unreachable statebecause the soft state ( we use max_retries = 3 in most cases ) of theparent is used to evaluate dependent reachability.

I've only been picking on hosts, but the same applies to services aswell. We monitor both the state of interfaces and BGP sessions. TheBGP session is dependent on an interface. Without using the soft stateof the interface, I would get both an alert for the BGP session and theinterface. If the interface is down, I know that the BGP session isgoing to be down as well. I don't need alerts for both. Because Ican't control the timing of those 2 checks, there is a high probabilitythat the BGP session will reach a hard state before the interface does.In that case I get a notification for both. In the case of using thesoft state as long as one check interval has been completed and theinterface is soft critical, the BGP session won't notify. Since we usemax_retries that will always be true.


On 11/1/2014 9:36 AM, Michael Friedrich wrote:

Am 17.10.2014 um 23:43 schrieb Barry Quiel:

I can't find any reference in the Icinga 2 docs to the Icinga 1
feature soft_state_dependencies. I didn't find any reference on the
monitoring-portal boards or in the icinga users mail archive.

Was this option carried forward from Icinga 1?
Is in an implicit option now?
Was it renamed?


Everything which is not explicitely mentioned on the migration docs, is
not part of Icinga 2's architecture. There are certainly some features
inherited from Nagios as a fork, which have been cut off, or just not
implemented in the new design. There was a long list of features which
has been evaluated step by step for its importance, and some of them
have been re-implemented with new algorithms, some found a new "home"
after a view revisions during the tech preview cycle, and some simply
don't exist. Like someone asked about a problem id lately.


With out that option it makes the dependencies less effective. There
is no way to line up the timing of the checks so that the child
hosts/services check after the parent goes into a hard state. This is
a crucial setting to help reduce the number of alerts around
correlated events.


I don't see how this would make sense with Icinga 2, as checks are
generally not cached, or written as check files on disk, and reaped an
interval (10seconds by default in worst case). But rather, the
reachability is immediately evaluated based on the available object states.

If you could give as a real world example using Icinga 1.x, and how you
have ported this into Icinga 2, we could discuss, and learn, what
problems you encounter with Icinga 2. For now, your explainations are a
bit too vague in my opinion.

Kind regards,
Michael

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Re: [icinga-users] Icinga2 soft_state_dependencies

Reply via email to