On Fri, May 19, 2017 at 10:28:05AM +0200, Florian Lohoff wrote:
> I can even see this in collectd stats of the host in grafana.
> 
> https://silicon-verl.de/home/flo/tmp/Grafana%20-%20Host%20Overview%202017-05-17%2019-37-34.png
> 

I produced a gnuplot graph from the debug.log showing the number of checks
scheduled per second and 3 rolling averages atop - 10, 30 and 60 seconds.

http://silicon-verl.de/home/flo/tmp/icinga2-2.6.3-scheduler-20170519-917-952.png

As one can see the checks/s spike to ~370. The 60 second average is ~80
checks/s. So we spike to 4.6 times the 60 second average.

This is an instance with ~194 Hosts - and 4796 checks and ~800
dependencies. Currently the checkers concurrent_checks is set to 80.
Setting this to the default "0" will even worsen the problem.

The event at 9:36:44 is an "service icinga2 reload" issued by the
automatic config generation. Icinga stops scheduling checks for nearly
a minute.

After the issued reload the 10s average nearly doubles. It seems
the reloading of the instance completely breaks the fan out
of the service check scheduling.

What i did:

grep "debug/CheckerComponent: Executing check for" /var/log/icinga2/debug.log \
        | sed -e 's/ debug.*//' \
        | sort \
        | uniq -c \
        | sed -e 's/\[//g' -e 's/ [+].*$//' \
        | awk '{ print $3 " " $1 }' \
        >data

Using this gplot command file:

set terminal png small size 1024,768
set output "data.png"
set xlabel "Time"
set xdata time
set autoscale xy
set grid
set format x "%H:%M:%S"
set ylabel "checks/s"
set timefmt "%H:%M:%S"
set title "scheduled checks per second"


min(a,b) = a >= b ? b : a
samples(n) = min(int($0), n)
avg_data = ""
sum_n(data, n) = ( n <= 0 ? 0 : word(data, words(data) - n) + sum_n(data, n - 
1))
avg(x, n) = ( avg_data = sprintf("%s %f", (int($0)==0)?"":avg_data, x), 
sum_n(avg_data, samples(n))/samples(n)) 

plot "data" using 1:2 title "Check/s" lt rgb "#A9A9A9" with lines,\
        '' using 1:(avg(column(2),10)) title "10s average" with lines,\
        '' using 1:(avg(column(2),30)) title "30s average" with lines,\
        '' using 1:(avg(column(2),60)) title "60s average" with lines

Flo
-- 
Florian Lohoff                                                 f...@zz.de
             UTF-8 Test: The 🐈 ran after a 🐁, but the 🐁 ran away

Attachment: signature.asc
Description: Digital signature

_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to