On Wed, Sep 28, 2011 at 2:03 AM, Shane <[email protected]> wrote:
> On Tue, 27 Sep 2011 19:44:02 +0200 Rob van der Heij wrote:
>
>> The probably refer to "load average" which is the number of processes
>> that run or would be able to run. The easy math is that for an N-way
>> guest, the load average larger than N means that processes wait to get
>> CPU resources. Since CPU resources are often the most expensive part,
>> it's good to wait for CPU rather than something else. If your load
>> average is lower than N, there's no CPU waiting as far as Linux is
>> concerned.
>
> Note that this is the *Unix* definition of loadavg - not the Linux
> definition.
> Unfortunately one of those of old saws that has been handed down
> through the generations. Software designers have been known to use this
> metric to (dynamically) limit connections to their software - database
> for example. Bad idea.
You're broadening the discussion a bit. It's good if the sysprog and
sysadmin and application owner coordinate the configuration. Too often
it does not happen that way because they don't understand each other.
And automatic application configuration rarely gets it right. Like the
database that configures itself to use 70% of installed memory. And so
does the next instance, and the next... :-(
:story.
Preparing to go live in the weekend, the DBA went through his tuning
again and checked memory configuration with the systems group. The
z/OS chap told him they had "40G allocated to the production Linux"
and our DBA set the SGA+PGA target at 32G - on all 10 database
guests... Monday by noon most of the 4G guests were swapping like
crazy...
> Mark gave the correct definition. Note that processes in
> uninterruptible sleep (state "D") comprise more than just those waiting
> on (disk) I/O to complete as is commonly asumed. HTTP servers parking
> threads as "D" then forgetting about them is not unknown. Makes an
> awful mess of the loadavg - and any software using the metric
> incorrectly.
> Mind you, it's hard to think of ways to use it correctly ...
Including the "D" in the count as "competing for CPU resources" is
motivated by the assumption they will soon be "R" again. z/VM keeps
guests with active I/O in queue as well to avoid them finding their
pages taken away when they return. And back then the protocol on the
CTC fooled z/VM to think they were about to run again.
I just did an experiment. After what others use, my guest can get
about half a CPU worth of resources. I started 4 processes burning CPU
cycles. The loadavg goes up slightly above 4 (my shell, snmpd, etc).
After a while, I enabled a 2nd virtual CPU, and load average remains 4
because there's 4 processing looping.
Screen: ESAWAIT4 Velocity Software-Test VSIVM4 ESAMON 3.808 09/27 23:3
1 of 1 Virtual CPU Wait State USER ROBLX1 CLASS * 20
<----------- Virtual CPU State Percentage ---------->
Time User Run CPUwt CPwt Limit IOwt PAGwt Other Idle Dorm
-------- -------- ----- ----- ----- ----- ----- ----- ----- ----- -----
23:31:00 ROBLX1 5.0 16.7 0 0 0 1.7 0 48.3 328.3
23:32:00 ROBLX1 6.7 31.7 0 0 0 0 0 50.0 311.7
23:33:00 ROBLX1 46.7 53.3 0 0 0 0 0 0 300.0
23:34:00 ROBLX1 51.7 48.3 0 0 0 0 0 0 300.0
23:35:00 ROBLX1 46.7 53.3 0 0 0 0 0 0 300.0
23:36:00 ROBLX1 46.7 53.3 0 0 0 0 0 0 300.0
23:37:00 ROBLX1 31.7 111.7 0 0 0 3.3 0 10.0 243.3
23:38:00 ROBLX1 45.0 155.0 0 0 0 0 0 0 200.0
23:39:00 ROBLX1 48.3 151.7 0 0 0 0 0 0 200.0
23:40:00 ROBLX1 45.0 155.0 0 0 0 0 0 0 200.0
The difference shows in the z/VM metrics. At 23:39 Linux has 2 virtual
CPUs that still get ~50% so they wait for 150%. If I enable 2 more
virtual CPUs, we get 50% and wait for 350%. It does not run the
looping processes any faster (slower in fact, because of overhead).
But the Linux admin now sees 4 virtual CPUs and loadavg of 4 and
believes it is more healthy... Virtualization changes the game. You
can't tell what's going on without the VM numbers.
Rob
--
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/