On Tue, Oct 7, 2014 at 6:00 PM, Michael Friedrich < [email protected]> wrote:
> Hi, > > Am 01.10.2014 um 10:55 schrieb Zsolt Dollenstein: > > Hi, > > We at Prezi are trying to migrate over to icinga2 and we've hit what seems > like a showstopper for us. We've spent about 2 days trying to debug the > issue to no avail, so any pointers are welcome. > > > Which version of Icinga 2, and how was Icinga 2 installed on which > distribution? > We are running off of the current master (with these <https://github.com/prezi/icinga2/compare/prezi-release> changes) on ubuntu. We built icinga2 with the debian packaging mechanisms in the repo (using dpkg-buildpackage). > > > In short, the issue is this: sometimes when we reload our icinga2 config > (via SIGHUP), both the new and old icinga2 processes stop working. This > happens about once every 4 reloads. > > > What comes to mind: Try strace and/or gdb attaching the 2 processes and > trace their actions after sending a SIGHUP signal. > > > http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/troubleshooting#debug > > Thanks, we haven't tried gdb yet, will give it a shot soon. Strace was not terribly helpful because of the amount of active checks (maybe we should try without tracing forks and just attach it to the two processes). > > From the logs it looks like the old process thinks all is well and is > terminating as expected (AFAICT the new process kills it properly). I can't > find any logs from the new process, not to mention any errors/warnings. We > have no idea why the new process stops. We have tried to turn on debug > logging to no avail. We even tried patching the code to see more logs from > the child process, and we were able to verify that it successfully parses > the configs and proceeds to shut down the parent. > > > May we see these modifications (git patch)? Maybe there's some additional > logging missing here. > Sure, https://github.com/prezi/icinga2/compare/prezi-release specifically, I meant this: https://github.com/prezi/icinga2/commit/ad90733b67a204754523206e757c48f948ae906a and another which I haven't bothered to check in (this was to make sure the child's stdout is not swallowed): https://gist.github.com/zsol/00d5bb59b12d48406810 > > This is a big problem for us because we have a biggish config (about 30K > services and 90K Notifications), so starting up (or validating the > configuration) takes about 5 minutes on a decent machine, which means when > this scenario happens, we're flying blind for that amount of time. > > > Just curious - what's a "decent machine"? 5 minutes sounds way too much > for that amount of objects. > It's a c3.2xlarge type instance on AWS EC2: http://www.ec2instances.info/?filter=c3.2xl Awesome to hear this because we thought it was weird, too :) Maybe I'll find some time to profile config parsing. > > > Any pointers are appreciated. > > [apologies for possibly duplicate emails, I think one copy of this is > sitting in the moderation queue] > > > Will remove that later on, no worries. > > Kind regards, > Michael > > > > -- > Michael Friedrich, DI (FH) > Application Developer > > NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg > Tel: +49 911 92885-0 | Fax: +49 911 92885-77 > GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461 > http://www.netways.de | [email protected] > > ** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp ** > ** OSMC 2014 - November - netways.de/osmc ** > ** OpenNebula Conf 2014 - Dezember - opennebulaconf.com ** > ** OSDC 2015 - April - osdc.de ** > -- *Zsolt Dollenstein* Developer at Prezi <http://prezi.com>
_______________________________________________ icinga-users mailing list [email protected] https://lists.icinga.org/mailman/listinfo/icinga-users
