Re: What could cause high CPU load averages (no actual CPU usage)?

Justin Yates Fletcher Wed, 25 Oct 2023 17:14:52 -0700

On Wed, 2023-10-25 at 21:12 +0200, Mike Fischer wrote:
> 
> > Am 25.10.2023 um 17:57 schrieb Theo de Raadt <dera...@openbsd.org>:
> > 
> > Mike Fischer <fischer+o...@lavielle.com> wrote:
> > 
> > > > Am 25.10.2023 um 17:29 schrieb Theo de Raadt
> > > > <dera...@openbsd.org>:
> > > > 
> > > > Mike Fischer <fisc...@lavielle.com> wrote:
> > > > 
> > > > > True. But like I said, this was noticed because of the sudden
> > > > > increase on the same (OpenBSD) machine without any obvious
> > > > > reason.
> > > > 
> > > > The reason is obvious.
> > > > 
> > > > You installed a completely different system.
> > > 
> > > No, there is a misunderstanding here. I have not been comparing
> > > OpenBSD load averages to those on any other OS.
> > 
> > No, it is *your misunderstanding*
> > 
> > We put no effort into maintaining stability of this damn number.
> 
> Ok, I realise that load average may too irrelevant a measurement to
> take seriously. I admit that I thought this value was somewhat
> consistent in the context of a single running machine, but maybe I
> was wrong.
>


Load average is fine to measure, but I think the point you are
misunderstanding is that you went from 0.0 to 0.7 (iirc).


> 
> > We changed a lot of kernel scheduling code *without giving a damn
> > about the
> > stability of this number*
> 
> Fine, but you are not changing my running Kernel, are you?
> 

I don't understand your point with this. Are you making an accusation?
If not, then why even write this?


> Or are you saying that the load average does not carry *any* inherent
> information and is utterly useless? That would almost imply that this
> is a (poor) sort of random number generator.
> 

Nope. That is not the case and nobody has said that. You saw a load
average change from 0.0 to some other number greater than 0.0 but less
than 1. You are trying to imply that this delta means something.

You have been told that it does not, many times.  It *can* mean
something but that is only within the context of understanding other
things.

> OTOH years of monitoring this value (amongst many other measurements)
> on OpenBSD seems to indicate some correlation to what the machine is
> doing. But I get what you are saying: no guarantees.
> 

Nope. You still have misunderstood what is being said. Especially
highlighted by your saying this is a 7.4 machine and having monitored
it for years...  the best you could say for 7.4 is you have monitored
it for almost 2 weeks.

And you have said it is a VM on VMware. You have a *huge* variable you
have only lightly taken into consideration. Monitoring system
performance on a VM is an exercise in futility without the underlying
host information.

And in my experience, still an exercise in futility even given the
underlying host information...  It is many opaque layers of
abstraction.


> 
> > It is a different system.
> 
> To reiterate: I am measuring load averages on OpenBSD 7.4. On a
> running system I notice a sudden jump in the value which persists for
> several hours. That gets my attention because I can see no reason for
> this jump. So I’m trying to figure out the cause.
> 

Your jump was less than 1.  On a graph with a scale of 0 and 1, that is
"huge"!

Ignore that and pay attention to the value. And understand that in
context.

> Please note that I am not going on the assumption that there is a bug
> or that something needs to be changed/fixed in OpenBSD. The jump may
> have had perfectly valid reasons. Or it may have been random with a
> low probability.


The "jump" you mention doesn't mean anything. Without context it means
less than nothing.  As has already been metioned.

It *might* mean that your rrd graph metric gathering is affecting your
graphs in a way that you have not seen before...  Monitoring uses
resources!

The goal is if the system is performing what it needs to be at a rate
that meets the needs, then the problem is solved.  Simple as that.


> But given all of the feedback from this thread I’ll deprecate this
> part of my monitoring and switch to monitoring actual CPU activity
> (as reported by e.g. vmstat) in the hopes that these values are more
> accurate/consistent and that they better reflect the workload of the
> machine.
> 
> 
> 

No! That would be a bad option, IMHO.  It is a metric that can be
valuable, but good system admin. is taking all the values and
understanding them in context.


> Thanks everyone!
> Mike
> 


Hope that helps,
Justin

Re: What could cause high CPU load averages (no actual CPU usage)?

Reply via email to