Check the steal column.  This is an anomaly, but the numbers on this server are 
high and often, which is why I am questioning.  We do see performance drops at 
this point, so something is not right.


 0  0    860  14652 937188 888260    0    0     0     0  108   24  0  0 100  0  0
 0  0    860  14652 937188 888260    0    0     0     0  109   23  0  1 98  0  1
 1  0    860  14232 937188 888260    0    0     0     0  107   33  1  1 43  0 55
 1  0    860  13984 937196 888252    0    0    20    48  133  223  0  1  9  5 85
 0  1    860  13860 937200 888248    0    0     8    48  511 1283  1  1 26  2 70
 0  0    860  13860 937200 888248    0    0    16     0  713 1283  1  2 26  7 64
 0  0    860  13860 937200 888248    0    0     0     0  510 1118  1  1 71  0 27

-----Original Message-----
From: The IBM z/VM Operating System [mailto:[email protected]] On Behalf 
Of Rob van der Heij
Sent: Saturday, November 14, 2009 11:13 AM
To: [email protected]
Subject: Re: zVM CPU allocation

On Fri, Nov 13, 2009 at 6:05 PM, Michael MacIsaac <[email protected]> wrote:
>
>> As Rob and Alan have less blatantly stated, Linux CPU numbers are bogus
>> in a virtual environment.
> However, with the addition of "steal percentage" (%st in top), the amount of
> CPU that is being "stolen" by the hipervisor, I believe many would agree
> that they are less bogus.

I like your "less bogus" qualification. I'm trying to be more PC and
call it "different"   And if we're into word games; I don't like
"steal" in this context. It suggests something you had was taken away.
But in this case, you did not have it and it could not be taken from
you :-)

Most people *do* agree that you need both Linux and z/VM data to make
sense of it or  understand whether there is a problem. When someone
claims to have wisdom in only a single metric, you normally don't have
to try very hard to show him wrong.

It is very easy to explain why the old Linux numbers were wrong (and
by how much) when you had the z/VM data already. We use the VM monitor
data to correct the Linux data.
It is true that with the "virtual CPU time accounting" in Linux (that
what produces the steal time) are not affected by that virtualization
effect anymore. The numbers are still a bit off, but in normal
situation the difference can be ignored. Unfortunately I often deal
with abnormal situations where people have performance problems.

In my "Understanding CPU Usage" presentation I show a case where z/VM
claims the guest uses 30% of a CPU, Linux says it uses 6% of a CPU,
and when you look for detailed per-process usage it adds up to 3% of a
CPU (with the new improved numbers). I bring a stuffed penguin for
someone in the audience who thinks Linux numbers are correct. And each
time it goes back home with me ;-)  This was indeed caused by a kernel
bug. I think we identified 3 problems with the new CPU accounting in
Linux because we match both numbers and want them to be correct.
Eventually those bugs get fixed in your systems too.

Whether the numbers are more correct or more often correct is not
really the issue. I believe it is more important what the value of
those numbers is for those who see it and whether they let you solve
the performance problem. The reason the Linux admin looks at CPU usage
is not because he is worried to wear out the CPU, but because he
believes the rest is still available for him to use. In a virtualized
environment it does not work like that. There's still a load of tools
out there that don't show steal as part of the metrics, but just have
user, nice, system. In that case it is better to see 99% to show the
system is out of CPU than to see 25% and have no clue.

But back to the original problem. If the 3 IFL's run at 10-15% we do
not expect the old metrics in Linux to show 99%. There must be
something else in the system causing this, and the monitor data would
reveal the cause.

Rob
--
Rob van der Heij
Velocity Software
http://www.velocitysoftware.com/


-----------------------------------------------------
Please see the following link for the BlueCross BlueShield of Tennessee E-mail 
disclaimer:  http://www.bcbst.com/email_disclaimer.shtm

Reply via email to