On Fri, May 19, 2017 at 10:28:05AM +0200, Florian Lohoff wrote: > I can even see this in collectd stats of the host in grafana. > > https://silicon-verl.de/home/flo/tmp/Grafana%20-%20Host%20Overview%202017-05-17%2019-37-34.png >
I produced a gnuplot graph from the debug.log showing the number of checks scheduled per second and 3 rolling averages atop - 10, 30 and 60 seconds. http://silicon-verl.de/home/flo/tmp/icinga2-2.6.3-scheduler-20170519-917-952.png As one can see the checks/s spike to ~370. The 60 second average is ~80 checks/s. So we spike to 4.6 times the 60 second average. This is an instance with ~194 Hosts - and 4796 checks and ~800 dependencies. Currently the checkers concurrent_checks is set to 80. Setting this to the default "0" will even worsen the problem. The event at 9:36:44 is an "service icinga2 reload" issued by the automatic config generation. Icinga stops scheduling checks for nearly a minute. After the issued reload the 10s average nearly doubles. It seems the reloading of the instance completely breaks the fan out of the service check scheduling. What i did: grep "debug/CheckerComponent: Executing check for" /var/log/icinga2/debug.log \ | sed -e 's/ debug.*//' \ | sort \ | uniq -c \ | sed -e 's/\[//g' -e 's/ [+].*$//' \ | awk '{ print $3 " " $1 }' \ >data Using this gplot command file: set terminal png small size 1024,768 set output "data.png" set xlabel "Time" set xdata time set autoscale xy set grid set format x "%H:%M:%S" set ylabel "checks/s" set timefmt "%H:%M:%S" set title "scheduled checks per second" min(a,b) = a >= b ? b : a samples(n) = min(int($0), n) avg_data = "" sum_n(data, n) = ( n <= 0 ? 0 : word(data, words(data) - n) + sum_n(data, n - 1)) avg(x, n) = ( avg_data = sprintf("%s %f", (int($0)==0)?"":avg_data, x), sum_n(avg_data, samples(n))/samples(n)) plot "data" using 1:2 title "Check/s" lt rgb "#A9A9A9" with lines,\ '' using 1:(avg(column(2),10)) title "10s average" with lines,\ '' using 1:(avg(column(2),30)) title "30s average" with lines,\ '' using 1:(avg(column(2),60)) title "60s average" with lines Flo -- Florian Lohoff f...@zz.de UTF-8 Test: The 🐈 ran after a 🐁, but the 🐁 ran away
signature.asc
Description: Digital signature
_______________________________________________ icinga-users mailing list icinga-users@lists.icinga.org https://lists.icinga.org/mailman/listinfo/icinga-users