On 02/09/2018 01:56 AM, Gert van den Berg wrote:
Obviously everybody's check_mk can be different, we're on the 693 kernel
on 7, and version 1.2.8p6 and we're not seeing this. 99% of the hosts
monitored are also running the 693 kernel (or the one for Centos 6).
My Check_MK 1.2.8.something version started having significant
performance issues after a reboot. After upgrading Check_MK to
1.4.something and the Proxmox that the VM is running on the 4.x, the
Eventually I tried downgrading to a pre-Meltdown kernel version,
currently on 3.10.0-693.11.1.el7, which resolved the problem. (There
might be some slightly newer options as well - something from
late-2017 seemed like the easiest to know)
(This is a VM with a around 3500 services and 12 vCPUs assigned (It
was on 8 previously, I increased that to try and resolve the problem))
Symptoms was lots of check timeouts, constant high CPU usage and a
load average in the 50s.
Has anyone else seen this? Is there a workaround that does not involve
holding the kernel version back?
We're only monitoring 4530 services. I don't recommend going too much
higher than that per check_mk.
Our check_MK runs on a physical host (thus "out of band" Dell PowerEdge
710) but a lot of what we monitor are VMs under oVirt.
Maybe something with Promox?
We have plans to put up a Check_Mk (OMD) inside a oVirt VM to monitor
particular VMs we're not currently monitoring, but we haven't done that yet.
checkmk-en mailing list