On 02/09/2018 01:56 AM, Gert van den Berg wrote:

My Check_MK 1.2.8.something version started having significant
performance issues after a reboot. After upgrading Check_MK to
1.4.something and the Proxmox that the VM is running on the 4.x, the
problem remained.

Eventually I tried downgrading to a pre-Meltdown kernel version,
currently on 3.10.0-693.11.1.el7, which resolved the problem. (There
might be some slightly newer options as well - something from
late-2017 seemed like the easiest to know)

(This is a VM with a around 3500 services and 12 vCPUs assigned (It
was on 8 previously, I increased that to try and resolve the problem))

Symptoms was lots of check timeouts, constant high CPU usage and a
load average in the 50s.

Has anyone else seen this? Is there a workaround that does not involve
holding the kernel version back?
Obviously everybody's check_mk can be different, we're on the 693 kernel on 7, and version 1.2.8p6 and we're not seeing this. 99% of the hosts monitored are also running the 693 kernel (or the one for Centos 6).

We're only monitoring 4530 services. I don't recommend going too much higher than that per check_mk.

Our check_MK runs on a physical host (thus "out of band" Dell PowerEdge 710) but a lot of what we monitor are VMs under oVirt.

Maybe something with Promox?

We have plans to put up a Check_Mk (OMD) inside a oVirt VM to monitor particular VMs we're not currently monitoring, but we haven't done that yet.
checkmk-en mailing list

Reply via email to