Hans Scheffers
AIX / Linux Systeembeheer
> Date: Sat, 2 Nov 2013 00:03:19 +0100
> From: michael.friedr...@gmail.com
> To: icinga-users@lists.sourceforge.net
> Subject: Re: [icinga-users] Reaper
>
> On 30.10.2013 06:14, Hans Scheffers wrote:
> > Hi,
> > I see the following massages in teh logfiles:
> >
> > [1383078054] Warning: Breaking out of check result reaper: max reaper
> > time (15) exceeded. Reaped 53 results, but more checkresults to
> > process. Perhaps check core performance tuning tips?
> >
> > System: opensuse 12.2, with ido2db, snmptt, pnp4nagios ; quad core
> > Xeon proc, 8g mem, 1 hardware raid 0 disk.
> >
> > Icinga.cfg:
> > #check_result_reaper_frequency=10
> > #max_check_result_reaper_time=30
> > check_result_reaper_frequency=5
> > max_check_result_reaper_time=15
> >
> > I also tried bigger reaper_time and smaller frequency but no luck. Can
> > anyone explain the tuning of the reaper some more? Can there be more
> > reapers running simultaneously?
>
> no, that doesn't make much sense for core 1.x not being multithreaded at
> all.
Ok, so that's mot the way to go ;)
>
> it obviously looks like that your core generates a lot of checkresults
> in a short couple of time, and therefore the checkresult reaper cannot
> process them that fast. it also sounds like that the in-memory list of
> checkresults has grown huge and takes ages to be processed.
>
> afterall it would be interesting in which interval the 5000 service
> checks are being run, and how long their execution time is. some
> icingastats and system performance graphing over time would help as well
> for the reader.
We have 4000 ~ 4500 checks that are running at the normal intervals or longer
(some of them to once a day)
The rest of the checks have to be executed every 2 minutes as per SLA, so yes,
a lot of checkresults are generated, and we need to have even more in the
(near) future; at the moment we're running in the test environment, but in a
week we will go live with a DRS also.
The number of 2 minute checks will then double (and test will go back to 5
minutes)
The checks that are generating the problems are the WMI checks, as soon as I
shut off these checks, the system is running fine with 4000 ~ 4500 checks.
We are now running the WMI checks on a little heavier hardware (Linux ppc
partition on a P710 with a v3700 SAN storage), and our reaper isn't
complaining anymore, also the latency is now about 5 sec max (on both systems).
We are not generating graphs at the moment on the PPC LPAR, because this is the
first test (started wednesday).
Our main goal now is to update the PPC to OpenSuSE 12.3 with icinga >= 1.9 and
moving all the test to this lpar (if needed we can extend the hardware still a
little). Then we will also generate the performance stats again :)
>
> https://wiki.icinga.org/display/howtos/Icinga+performance+analysis
> https://wiki.icinga.org/display/howtos/Optimize+Icinga+Performance
I have read them, but the tuning of the reaper in this piece is a little bit
harsh,.... still don't really know how to determine the optimal values.
>
>
> --
> DI (FH) Michael Friedrich
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
icinga-users mailing list
icinga-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/icinga-users