Am 08.10.2014 um 11:41 schrieb Zsolt Dollenstein:
On Tue, Oct 7, 2014 at 6:00 PM, Michael Friedrich <[email protected] <mailto:[email protected]>> wrote: Hi, Am 01.10.2014 um 10:55 schrieb Zsolt Dollenstein:Hi, We at Prezi are trying to migrate over to icinga2 and we've hit what seems like a showstopper for us. We've spent about 2 days trying to debug the issue to no avail, so any pointers are welcome.Which version of Icinga 2, and how was Icinga 2 installed on which distribution? We are running off of the current master (with these <https://github.com/prezi/icinga2/compare/prezi-release> changes) on ubuntu. We built icinga2 with the debian packaging mechanisms in the repo (using dpkg-buildpackage).
Uhm. Keep in mind that the master branch ist used for current development towards the 2.2 feature milestone. Today the cli commands base has been merged which introduces certain changes. If I were you, I would go for support/2.1 and build based upon that, and only switch to master if this is really just a playground and developers demand you to test something.
In short, the issue is this: sometimes when we reload our icinga2 config (via SIGHUP), both the new and old icinga2 processes stop working. This happens about once every 4 reloads.What comes to mind: Try strace and/or gdb attaching the 2 processes and trace their actions after sending a SIGHUP signal. http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/troubleshooting#debug Thanks, we haven't tried gdb yet, will give it a shot soon. Strace was not terribly helpful because of the amount of active checks (maybe we should try without tracing forks and just attach it to the two processes).
Hmmm, yeah, I'd only look into what the "parent" process is doing before it stops it's operation. What comes to mind - when the old parent process receives the termination signal, it stores it's state data into the icinga2.state file. Once the process has terminated sucessfully, the child process takes over and re-reads the state file to ensure that the history is w/o loss after doing config validation. Might be some sort of race condition over here, but that's just a blind guess. The current git master moves the daemon code into the cli subcommand, so you'll find that currently below lib/cli/daemoncommand.cpp currently (could be changed in the next weeks though). For compiling - the package build uses the release flag with debug symbols. You might want to recompile in debug mode to get more information too. CMAKE_BUILD_TYPE=Debug is what you're looking for. Below is an excerpt of my bashrc I'm using for building different types of Icinga 2. export CMAKE_OPTS_DEBUG="-DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_BUILD_TYPE=Debug -DICINGA2_USER=icinga -DICINGA2_GROUP=icinga -DICINGA2_COMMAND_USER=icinga -DICINGA2_COMMAND_GROUP=icingacmd" export CMAKE_OPTS_NORMAL="-DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var -DCMAKE_BUILD_TYPE=RelWithDebInfo -DICINGA2_USER=icinga -DICINGA2_GROUP=icinga -DICINGA2_COMMAND_USER=icinga -DICINGA2_COMMAND_GROUP=icingacmd" alias icinga2_debug='rm -rf build ; mkdir build ; cd build ; cmake $CMAKE_OPTS_DEBUG .. ; sudo make -j8 install ; cd ..' alias icinga2_normal='rm -rf build ; mkdir build ; cd build ; cmake $CMAKE_OPTS_NORMAL .. ; sudo make -j8 install ; cd ..'
From the logs it looks like the old process thinks all is well and is terminating as expected (AFAICT the new process kills it properly). I can't find any logs from the new process, not to mention any errors/warnings. We have no idea why the new process stops. We have tried to turn on debug logging to no avail. We even tried patching the code to see more logs from the child process, and we were able to verify that it successfully parses the configs and proceeds to shut down the parent.May we see these modifications (git patch)? Maybe there's some additional logging missing here. Sure, https://github.com/prezi/icinga2/compare/prezi-release
Hmmm. There are some patches on that list which would make sense upstream, whilst otherwise you're keeping a local fork (no-one wants that). Feel free to open issues with attached patches / pr urls.
specifically, I meant this: https://github.com/prezi/icinga2/commit/ad90733b67a204754523206e757c48f948ae906a
That looks like one past issue we have when a reload does not log any feedback. I'll talk with Gunnar about that tomorrow.
and another which I haven't bothered to check in (this was to make sure the child's stdout is not swallowed): https://gist.github.com/zsol/00d5bb59b12d48406810This is a big problem for us because we have a biggish config (about 30K services and 90K Notifications), so starting up (or validating the configuration) takes about 5 minutes on a decent machine, which means when this scenario happens, we're flying blind for that amount of time.Just curious - what's a "decent machine"? 5 minutes sounds way too much for that amount of objects. It's a c3.2xlarge type instance on AWS EC2: http://www.ec2instances.info/?filter=c3.2xl
So 8 cores, 16gb and fast disks.
Awesome to hear this because we thought it was weird, too :) Maybe I'll find some time to profile config parsing.
Most likely you'll go the Valgrind way, or profile only configuration snippets. https://blog.netways.de/2013/09/05/profiling-mit-gperftools/ Not sure what else may help, but we'll see. Kind regards, Michael
Any pointers are appreciated. [apologies for possibly duplicate emails, I think one copy of this is sitting in the moderation queue]Will remove that later on, no worries. Kind regards, Michael -- Michael Friedrich, DI (FH) Application Developer NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg Tel: +49 911 92885-0 | Fax: +49 911 92885-77 <tel:%2B49%20911%2092885-77> GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461 http://www.netways.de | [email protected] <mailto:[email protected]> ** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp <http://netways.de/puppetcamp> ** ** OSMC 2014 - November - netways.de/osmc <http://netways.de/osmc> ** ** OpenNebula Conf 2014 - Dezember - opennebulaconf.com <http://opennebulaconf.com> ** ** OSDC 2015 - April - osdc.de <http://osdc.de> ** -- *Zsolt Dollenstein* Developer at Prezi <http://prezi.com>
-- Michael Friedrich, DI (FH) Application Developer NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg Tel: +49 911 92885-0 | Fax: +49 911 92885-77 GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461 http://www.netways.de | [email protected] ** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp ** ** OSMC 2014 - November - netways.de/osmc ** ** OpenNebula Conf 2014 - Dezember - opennebulaconf.com ** ** OSDC 2015 - April - osdc.de ** _______________________________________________ icinga-users mailing list [email protected] https://lists.icinga.org/mailman/listinfo/icinga-users
