Petr Bena <[email protected]> wrote: > I think it's a time to finally make it possible for users to create > own check for nagios (icinga).
> I will try to document how current icinga is setup on > https://wikitech.wikimedia.org/wiki/Icinga/Labs > Currently there is a nagiosbuilder which is a python script made by > Damian which query the ldap and build nagios cfg files based on that. > My idea is to create configuration files / templates for this > nagiosbuilder so that it would apply different options for certain > hosts based on this configuration. Users would just > * create own check, place it somewhere on the server which they want > to monitor the service at and insert it to nrpe > (/etc/nagios/nrpe.d/yourservice.cfg) > * use some interface (to be discussed) to insert this check for specific host > and nagiosbuilder would > * query that interface in order to generate configuration files > * based on this config would set up custom services for these hosts > For us (developers) it's most easy to use gerrit as this interface, so > that people would directly update these configuration files used by > nagiosbuilder, however that pretty much suck, so I think it would be > better to create a new interface into labsconsole, so that people can > define their nagios checks directly as a property of each node. > That of course would require more coding and ops assistance but I > think it's doable. Some opinions? a) Great to see some progress on that front :-). b) I think differences between production and Labs hosts should be minimal. If we have to add (and sync!) checks manually, it's gonna be a big mess. If I configure an instance with the class redis, that class's monitoring should be used without any further intervention. (If I set up my own puppetmaster and use changes not committed to the WMF repository yet, that would be an acceptable exception.) I assume this is most important for Beta (cf. https://bugzilla.wikimedia.org/51497), but all other projects managed by operations/puppet would profit from that as well. Comment #2 of that bug refers to security reasons for why we can't copy the production setup verbatim, but be- fore we introduce another system, I think we should try to mitigate those concerns first, i. e. see what users can fid- dle with without review and what impact that has on the processes run on the puppetmaster and/or Icinga. What we will need some sort of UI for is the configuration of alerts. I assume not every project wants to receive mails or IRC messages whenever something's broken, and even in one project there may be different levels or interests. Tim _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
