On Wed, 28 Sep 2011 10:18:55 +0200 Rob van der Heij wrote:

> Including the "D" in the count as "competing for CPU resources" is
> motivated by the assumption they will soon be "R" again.

Which is all fine *if* everyone plays by the "rules". Short-lived "D"
is fine - say from an interrupt handler like it was (presumably)
designed for.
Unfortunately it got loose in the developer world. If code marks a
thread as uninterruptible sleep, it is exactly that. It accepts no
signal *at all*. And unlike zombies, they never get reaped if the
mother task goes away.
Imagine a sysadmin rings and says his Bozo AG system (which run all of
Prod) has just stopped and issued a message saying the loadavg is above
(say) 250% of the number of engines.
Running tasks - 2; engines 4; loadavg 11; CPU% ~0 !!!.
Ummmm. Add a couple of virtual CPs gets the ratio down but what do you
do when in 2 hours time the loadavg is 20, then later 100 ?.

This is just a terrible metric to base decisions on.
A re-boot is required to clear a situation like the above - and it
happens.

Shane ...

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to