Am 08.10.2014 um 11:41 schrieb Zsolt Dollenstein:


On Tue, Oct 7, 2014 at 6:00 PM, Michael Friedrich
<[email protected] <mailto:[email protected]>>
wrote:

    Hi,

    Am 01.10.2014 um 10:55 schrieb Zsolt Dollenstein:
    Hi,

    We at Prezi are trying to migrate over to icinga2 and we've hit
    what seems like a showstopper for us. We've spent about 2 days
    trying to debug the issue to no avail, so any pointers are welcome.

    Which version of Icinga 2, and how was Icinga 2 installed on which
    distribution?


We are running off of the current master (with these
<https://github.com/prezi/icinga2/compare/prezi-release> changes) on
ubuntu. We built icinga2 with the debian packaging mechanisms in the
repo (using dpkg-buildpackage).

Uhm. Keep in mind that the master branch ist used for current
development towards the 2.2 feature milestone. Today the cli commands
base has been merged which introduces certain changes.

If I were you, I would go for support/2.1 and build based upon that, and
only switch to master if this is really just a playground and developers
demand you to test something.



    In short, the issue is this: sometimes when we reload our icinga2
    config (via SIGHUP), both the new and old icinga2 processes stop
    working. This happens about once every 4 reloads.

    What comes to mind: Try strace and/or gdb attaching the 2
    processes and trace their actions after sending a SIGHUP signal.

    
http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/troubleshooting#debug


Thanks, we haven't tried gdb yet, will give it a shot soon.
Strace was not terribly helpful because of the amount of active checks
(maybe we should try without tracing forks and just attach it to the
two processes).

Hmmm, yeah, I'd only look into what the "parent" process is doing before
it stops it's operation.

What comes to mind - when the old parent process receives the
termination signal, it stores it's state data into the icinga2.state
file. Once the process has terminated sucessfully, the child process
takes over and re-reads the state file to ensure that the history is w/o
loss after doing config validation. Might be some sort of race condition
over here, but that's just a blind guess.

The current git master moves the daemon code into the cli subcommand, so
you'll find that currently below lib/cli/daemoncommand.cpp currently
(could be changed in the next weeks though).

For compiling - the package build uses the release flag with debug
symbols. You might want to recompile in debug mode to get more
information too. CMAKE_BUILD_TYPE=Debug is what you're looking for.

Below is an excerpt of my bashrc I'm using for building different types
of Icinga 2.

export CMAKE_OPTS_DEBUG="-DCMAKE_INSTALL_PREFIX=/usr
-DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var
-DCMAKE_BUILD_TYPE=Debug -DICINGA2_USER=icinga -DICINGA2_GROUP=icinga
-DICINGA2_COMMAND_USER=icinga -DICINGA2_COMMAND_GROUP=icingacmd"
export CMAKE_OPTS_NORMAL="-DCMAKE_INSTALL_PREFIX=/usr
-DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_INSTALL_LOCALSTATEDIR=/var
-DCMAKE_BUILD_TYPE=RelWithDebInfo -DICINGA2_USER=icinga
-DICINGA2_GROUP=icinga -DICINGA2_COMMAND_USER=icinga
-DICINGA2_COMMAND_GROUP=icingacmd"

alias icinga2_debug='rm -rf build ; mkdir build ; cd build ; cmake
$CMAKE_OPTS_DEBUG .. ; sudo make -j8 install ; cd ..'
alias icinga2_normal='rm -rf build ; mkdir build ; cd build ; cmake
$CMAKE_OPTS_NORMAL .. ; sudo make -j8 install ; cd ..'




    From the logs it looks like the old process thinks all is well
    and is terminating as expected (AFAICT the new process kills it
    properly). I can't find any logs from the new process, not to
    mention any errors/warnings. We have no idea why the new process
    stops. We have tried to turn on debug logging to no avail. We
    even tried patching the code to see more logs from the child
    process, and we were able to verify that it successfully parses
    the configs and proceeds to shut down the parent.

    May we see these modifications (git patch)? Maybe there's some
    additional logging missing here.


Sure, https://github.com/prezi/icinga2/compare/prezi-release

Hmmm. There are some patches on that list which would make sense
upstream, whilst otherwise you're keeping a local fork (no-one wants
that). Feel free to open issues with attached patches / pr urls.

specifically, I meant this:
https://github.com/prezi/icinga2/commit/ad90733b67a204754523206e757c48f948ae906a


That looks like one past issue we have when a reload does not log any
feedback. I'll talk with Gunnar about that tomorrow.

and another which I haven't bothered to check in (this was to make
sure the child's stdout is not swallowed):
https://gist.github.com/zsol/00d5bb59b12d48406810


    This is a big problem for us because we have a biggish config
    (about 30K services and 90K Notifications), so starting up (or
    validating the configuration) takes about 5 minutes on a decent
    machine, which means when this scenario happens, we're flying
    blind for that amount of time.

    Just curious - what's a "decent machine"? 5 minutes sounds way too
    much for that amount of objects.


It's a c3.2xlarge type instance on AWS EC2:
http://www.ec2instances.info/?filter=c3.2xl

So 8 cores, 16gb and fast disks.


Awesome to hear this because we thought it was weird, too :) Maybe
I'll find some time to profile config parsing.

Most likely you'll go the Valgrind way, or profile only configuration
snippets.

https://blog.netways.de/2013/09/05/profiling-mit-gperftools/

Not sure what else may help, but we'll see.

Kind regards,
Michael



    Any pointers are appreciated.

    [apologies for possibly duplicate emails, I think one copy of
    this is sitting in the moderation queue]

    Will remove that later on, no worries.

    Kind regards,
    Michael


    -- 
    Michael Friedrich, DI (FH)
    Application Developer

    NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
    Tel: +49 911 92885-0 | Fax: +49 911 92885-77
    <tel:%2B49%20911%2092885-77>
    GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461
    http://www.netways.de | [email protected]
    <mailto:[email protected]>

    ** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp
    <http://netways.de/puppetcamp> **
    ** OSMC 2014 - November - netways.de/osmc <http://netways.de/osmc> **
    ** OpenNebula Conf 2014 - Dezember - opennebulaconf.com
    <http://opennebulaconf.com> **
    ** OSDC 2015 - April - osdc.de <http://osdc.de> **




--

*Zsolt Dollenstein*
Developer at Prezi <http://prezi.com>



-- 
Michael Friedrich, DI (FH)
Application Developer

NETWAYS GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
GF: Julian Hein, Bernd Erk | AG Nuernberg HRB18461
http://www.netways.de | [email protected]

** Puppet Camp Duesseldorf 2014 - Oktober - netways.de/puppetcamp **
** OSMC 2014 - November - netways.de/osmc **
** OpenNebula Conf 2014 - Dezember - opennebulaconf.com **
** OSDC 2015 - April - osdc.de **
_______________________________________________
icinga-users mailing list
[email protected]
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to