> Am 25.10.2023 um 17:07 schrieb Theo de Raadt <dera...@openbsd.org>: > > Claudio Jeker <cje...@diehard.n-r-g.com> wrote: > >> On Wed, Oct 25, 2023 at 11:57:54AM +0200, Mike Fischer wrote: >>> I have been observing occasional bouts of high load averages on several >>> servers I administer and I am trying to find the cause. (I monitor these >>> machines so that I can implement corrective measures in case of any >>> malicious or abnormal activity. I think this is benign, but I’d still >>> like to find the cause.) >>> >>> Once the high load average starts, only a reboot seems to (temporarily) >>> return the values to their normal levels. >>> >>> The actual CPU usage (as measured by vmstat) stays low even if the load >>> average is elevated. >>> >>> The servers are VMs running on a VMWare host (ESXi). This was seen with >>> OpenBSD 7.3 and 7.4 amd64. >>> >>> I can not determine anything inside the VM that causes this. There seems >>> to be no correlation to pfstat(8) graphs, log entries, known events, or >>> anything else I can determine. restarting all of the rc.d services never >>> made any difference. >>> >>> Could this be caused by something on the VMWare host machine? (The host >>> seems to be operating at limit regarding RAM for example. But the VM is >>> only using the normal percentage of its allocated RAM — way below 100% >>> and very constant usage, no swap.) >>> >>> How can I further debug this, keeping in mind that these are production >>> machines and experimentation is limited to benign things that don’t >>> cause outages. >>> >> >> What is high? A high CPU load for me is in the order of 70+. >> Please remember the CPU load avarage is a horrible leftover from tenex >> days. The system just counts how many processes are runnable but it is a >> very bad indicator of actual CPU load. > > Furthermore, every operating system counts this in a different way. > You might think there is only one way to count it. Not at all.
True. But like I said, this was noticed because of the sudden increase on the same (OpenBSD) machine without any obvious reason. I am not implying that the value of 0.7 is in any way critical. Just that an increase from a long time load average of 0.0x to 0.7x is noteworthy. I have no issue when the load increases when a machine is handling requests or doing something I know about. But then the load should drop back to normal levels once the task is finished. That did not happen in the cases I’m trying to figure out. Thanks! Mike