Hi everyone,

I'm new to this list, but I'm a long time user of nagios and now icinga.
Over the years I've set up some reasonably large instances (hundreds of
hosts, thousands of services)

I'm in the process of overhauling an old nagios setup to icinga (nominally
1.7.1 as that is what is currently packaged for Debian).
I'm using what I've learned over the years to try and make the best use of
the the nagios/icinga config file template and inheritance properties, to
simplify my life in what will end up a system with over 200 hosts and
perhaps 4000 services.

I have hit an unexpected problem. It feels to me like the issue is with the
parsing of the config files and how the inheritance is applied, but I don't
rule out it actually being caused by some misunderstanding on my part.

Want a challenge? Take a look at the config below and see if you can shed
any light on why it doesn't work.
Note - there are lots of words in this email - I've tried to be concise,
but at the same time, I clearly need to explain my thinking, and what I
have done.
If you do choose to look, then I thank you in advance for your time and
patience.

Ok, here goes.
Firstly, I want to add in a concept of priorities to icinga.
I wish to split all service definitions into one of three priorities -
Critical, normal and low.
I call these 1 for critical, 2 for normal and 3 for low.
I created some service templates for this:
define service{
        name                            t:service:priority:1
        max_check_attempts              2
        notifications_enabled           1
        notification_options            w,c,r
        notification_interval           15
        first_notification_delay        0
        notification_period             24x7
        register                        0
}
define service{
        name                            t:service:priority:2
        max_check_attempts              3
        notifications_enabled           1
        notification_options            c,r
        notification_interval           60
        first_notification_delay        0
        notification_period             24x7
        register                        0
}
define service{
        name                            t:service:priority:3
        max_check_attempts              3
        notifications_enabled           1
        notification_options            c,r
        notification_interval           90
        first_notification_delay        0
        notification_period             workhours
        register                        0
}

Next, I have a root template for services, that defines the other basics
that get applied to everything:
define service{
        name                            t:service:root
        check_period                    24x7
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 1
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        is_volatile                     0
        contact_groups                  admins
        register                        0
}

As with most icinga setups, my services are tested in one of two ways -
active or passive (using nsca)
So I have some service templates for these too:
define service{
        name                            t:service:passive
        use                             t:service:root
        active_checks_enabled           0
        passive_checks_enabled          1
        check_freshness                 1
        check_command                   c:passive:no-update
        register                        0
        }
define service{
        name                            t:service:active
        use                             t:service:root
        active_checks_enabled           1
        passive_checks_enabled          0
        check_freshness                 0
        register                        0
}

Now lets look at a specific service we want to check - disk space.
Disk is passively reported, and for any given host we may want it to alert
at (one of) priority 1, 2 or 3.
This means that my disk service tests ultimately get defined using this
collection of templates, services and hostgroups:

# Firstly, a non-prioritised service template that defines the check
interval, and service_description
# Note we inherit t:service:passive to pull in the usual passive check
values and ultimately t:service:root values too
define service{
        name                    t:service:passive:core:disk
        use                     t:service:passive
        service_description     DISK
        check_interval          10
        register                0
}

# Next we add priorities
# We define three registered services - one each for priorities 1, 2 and 3
# They inherit the above t:service:passive:core:disk template
# Note each service here defines it's own hostgroup to specify which hosts
will use the service
define service{
        use
t:service:passive:core:disk,t:service:priority:1
        hostgroup_name          g:host:passive:core:disk:1
        register                1
}
define service{
        use
t:service:passive:core:disk,t:service:priority:2
        hostgroup_name          g:host:passive:core:disk:2
        register                1
}
define service{
        use
t:service:passive:core:disk,t:service:priority:3
        hostgroup_name          g:host:passive:core:disk:3
        register                1
}

# And here are the hostgroup definitions for the above service definitions
# Note we don't specify the hosts here - we'll use the trick of specifying
them in the host def instead
define hostgroup{
        hostgroup_name          g:host:passive:core:disk:1
        alias                   Disk usage (Priority 1)
}
define hostgroup{
        hostgroup_name          g:host:passive:core:disk:2
        alias                   Disk usage (Priority 2)
}
define hostgroup{
        hostgroup_name          g:host:passive:core:disk:3
        alias                   Disk usage (Priority 3)
}

I have multiple other 'core' passive services defined in the above way.

Lets say I have 4 core passive services on a host named 'host_z' (in
reality I have many more)
They are for disk, mailq, cpu and logs
This is an important host for me, so I want each of those services to alert
at priority 1
I can specify this the longhand way, using the following in the host
definition:
define host{
        host_name       host_z
        display_name    some details here
        address         10.99.99.1
        hostgroups
 
g:host:passive:core:disk:1,g:host:passive:core:mailq:1,ghost:passive:core:cpu:1,g:host:passive:core:logs:1
        use             t:host:server,t:host:priority:1
}

The above works as expected - the config is parsed, and my host ends up
with services defined for disk, mailq, cpu and logs, and they alert when in
a not OK state with the periodicity I defined for priority 1.

However, here is where things get interesting, and a little odd.
The above is fine when there are a small number of services to define
against a host, but is very cumbersome in reality for me as I have 20 or so
to define on each host, and hundreds of hosts.

So I thought I'd come up with a touch of brilliance in my use of
inheritance to fix things.
I created another hostgroup like this:
hostgroup{
        hostgroup_name        g:host:passive:core:1
        alias        All Priority 1 Passively Reported Core Services
        hostgroup_members
 
g:host:passive:core:disk:1,g:host:passive:core:mailq:1,ghost:passive:core:cpu:1,g:host:passive:core:logs:1
}
(I created others for priority 2, 3 etc too, and my actual versions had 20
or so hostgroups in the hostgroup_members line)

I then modified my host definition to read:
define host{
        host_name       host_z
        display_name    some details here
        address         10.99.99.1
        hostgroups      g:host:passive:core:1
        use             t:host:server,t:host:priority:1
}

Having done the above, I ran a check on the config, confident that it would
all work nicely.
It did! erm, and it didn't.
The config is parsed, and no errors are found.
However, no services are defined against host_z any more.
None at all.


So - to try and summarise the above in a few words:
1) If I include the hostgroups directly in my host they are correctly
parsed, and my services exist
2) If I include those same hostgroups in another hostgroup via
hostgroup_members, and then use that new hostgroup in my host, my services
vanish, but icinga sees the config as valid (note I have
allow_empty_hostgroup_assignment=1, and without this enabled, the parser
decides the config has errors, due to hostgroups with no members)

>From my reading of the documentation, it seems to me like what I'm trying
should work.
Am I wrong?
Is the Icinga config parser wrong?

One final note:
I have worked around the issue for now
Instead of my g:host:passive:core:1 hostgroup that contains all the
priority 1 hostgroups, I have come up with another trick.
I have set use_regexp_matching=1
and now my host definition reads:
define host{
        host_name       host_z
        display_name    some details here
        address         10.99.99.1
        hostgroups      g:host:passive:core:.*:1
        use             t:host:server,t:host:priority:1
}

This works a treat!

Thanks for your time,

Chris
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
icinga-users mailing list
icinga-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/icinga-users

Reply via email to