I think that Ryan said something like he would most happily get rid of puppet or replace it with a better solution :P but if you really want to keep stuff managed by puppet, I still see an issue with other projects which aren't using puppet, or which do use different puppetmaster.
To be honest, from my point of view, puppet as it is now on labs is almost unusable for non-ops users. Getting any simple change merged unless it's top priority thing requires someone from ops, and usually take at least few hours if not days. I can't imagine any sysadmin who can work like this, some changes need to be applied immediately, you can't wait for them to happen for days, so I expect that waste majority of projects that exist now will not use puppet anyway (you just can't force people to use it under these circumstances), so they wouldn't benefit from this. That is why I think that even if we are to use this puppet nrpe management there still should be a way for manual adjustments and not just because of these projects, but also to fix other icinga issues. For example right now it receive some nonsense (broken) data from ldap about instances that don't even exist anymore. If there wasn't that nasty workaround consisting of instance ignore list, that prevents these hosts from being monitored, icinga would be full of hosts that are down. How would you apply i_dont_exist puppet class to nonexisting node? :P I have nothing against "labs cloning production" beside that IMHO it should be the other way (production should actually clone labs, which is the testing env where changes should happen first before they get deployed on production), but still labs != production so I think we could have some extra thing here that would make it easier to manage icinga for regular, non-ops people which would exist on labs only and not on production. On Tue, Feb 4, 2014 at 12:10 AM, Antoine Musso <[email protected]> wrote: > Le 03/02/2014 19:32, Petr Bena a écrit : >> I think it's a time to finally make it possible for users to create >> own check for nagios (icinga). >> >> I will try to document how current icinga is setup on >> https://wikitech.wikimedia.org/wiki/Icinga/Labs >> >> Currently there is a nagiosbuilder which is a python script made by >> Damian which query the ldap and build nagios cfg files based on that. > <snip> > > Thank you Petan for resurrecting Icinga on labs :-] > > In production, the nrpe checks are being transitioned to use a define > such as: > > nrpe::monitor_service { 'jenkins': > description => 'jenkins_service_running', > nrpe_command => "/usr/lib/nagios/plugins/check_procs -w 1:1 -c 1:1 > --ereg-argument-array '^/usr/bin/java .*-jar > /usr/share/jenkins/jenkins.war'" > } > > Whenever that is run on an instance, it will provision the nrpe command > under /etc/nagios/nrpe.d , an example for the beta bastion: > > deployment-bastion$ ls -1 /etc/nagios/nrpe.d > check_disk_space.cfg > check_dpkg.cfg > check_puppet_disabled.cfg > check_raid.cfg > $ > > The challenge is finding out which commands are on the instances: > > - I am pretty sure nrpe does not let you list available commands > - from LDAP you only have the role class and have no clue which defines > have been run on the target instance > > :( > > -- > Antoine "hashar" Musso > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
