On Tue, Oct 7, 2014 at 6:00 PM, Michael Friedrich <
[email protected]> wrote:

>  Hi,
>
> Am 01.10.2014 um 10:55 schrieb Zsolt Dollenstein:
>
>  Hi,
>
> We at Prezi are trying to migrate over to icinga2 and we've hit what seems
> like a showstopper for us. We've spent about 2 days trying to debug the
> issue to no avail, so any pointers are welcome.
>
>
> Which version of Icinga 2, and how was Icinga 2 installed on which
> distribution?
>

We are running off of the current master (with these
<https://github.com/prezi/icinga2/compare/prezi-release> changes) on
ubuntu. We built icinga2 with the debian packaging mechanisms in the repo
(using dpkg-buildpackage).


>
>
> In short, the issue is this: sometimes when we reload our icinga2 config
> (via SIGHUP), both the new and old icinga2 processes stop working. This
> happens about once every 4 reloads.
>
>
> What comes to mind: Try strace and/or gdb attaching the 2 processes and
> trace their actions after sending a SIGHUP signal.
>
>
> http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/troubleshooting#debug
>
>
Thanks, we haven't tried gdb yet, will give it a shot soon.
Strace was not terribly helpful because of the amount of active checks
(maybe we should try without tracing forks and just attach it to the two
processes).


>
> From the logs it looks like the old process thinks all is well and is
> terminating as expected (AFAICT the new process kills it properly). I can't
> find any logs from the new process, not to mention any errors/warnings. We
> have no idea why the new process stops. We have tried to turn on debug
> logging to no avail. We even tried patching the code to see more logs from
> the child process, and we were able to verify that it successfully parses
> the configs and proceeds to shut down the parent.
>
>
> May we see these modifications (git patch)? Maybe there's some additional
> logging missing here.
>

Sure, https://github.com/prezi/icinga2/compare/prezi-release
specifically, I meant this:
https://github.com/prezi/icinga2/commit/ad90733b67a204754523206e757c48f948ae906a
and another which I haven't bothered to check in (this was to make sure the
child's stdout is not swallowed):
https://gist.github.com/zsol/00d5bb59b12d48406810


>
> This is a big problem for us because we have a biggish config (about 30K
> services and 90K Notifications), so starting up (or validating the
> configuration) takes about 5 minutes on a decent machine, which means when
> this scenario happens, we're flying blind for that amount of time.
>
>
> Just curious - what's a "decent machine"? 5 minutes sounds way too much
> for that amount of objects.
>

It's a c3.2xlarge type instance on AWS EC2:
http://www.ec2instances.info/?filter=c3.2xl

Awesome to hear this because we thought it was weird, too :) Maybe I'll
find some time to profile config parsing.


>
>
> Any pointers are appreciated.
>
>  [apologies for possibly duplicate emails, I think one copy of this is
> sitting in the moderation queue]
>
>
> Will remove that later on, no worries.
>
> Kind regards,
> Michael
>
>
>
> --
> Michael Friedrich, DI (FH)
> Application Developer
>
> NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
> Tel: +49 911 92885-0 | Fax: +49 911 92885-77
> GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461
> http://www.netways.de | [email protected]
>
> ** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp **
> ** OSMC 2014 - November - netways.de/osmc **
> ** OpenNebula Conf 2014 - Dezember - opennebulaconf.com **
> ** OSDC 2015 - April - osdc.de **
>



-- 

*Zsolt Dollenstein*
Developer at Prezi <http://prezi.com>
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to