Mike Fischer <fischer+o...@lavielle.com> writes:
> I have been observing occasional bouts of high load averages on > several servers I administer and I am trying to find the cause. (I > monitor these machines so that I can implement corrective measures in > case of any malicious or abnormal activity. I think this is benign, > but I’d still like to find the cause.) > > Once the high load average starts, only a reboot seems to (temporarily) > return the values to their normal levels. > > The actual CPU usage (as measured by vmstat) stays low even if the load > average is elevated. > > The servers are VMs running on a VMWare host (ESXi). This was seen with > OpenBSD 7.3 and 7.4 amd64. > > I can not determine anything inside the VM that causes this. There > seems to be no correlation to pfstat(8) graphs, log entries, known > events, or anything else I can determine. restarting all of the rc.d > services never made any difference. > > Could this be caused by something on the VMWare host machine? (The > host seems to be operating at limit regarding RAM for example. But the > VM is only using the normal percentage of its allocated RAM — way > below 100% and very constant usage, no swap.) > > How can I further debug this, keeping in mind that these are production > machines and experimentation is limited to benign things that don’t cause > outages. > Can you share a dmesg of one of the 7.4 vm? The output of `vmstat -iz` might help narrow it down to a stuck interrupt. Also, try running systat(1) and observe things as they happen.