Re: z/VM 4.2 Dispatching Problem running about 120 Li nux copies

Sivey,Lonny Sat, 21 Sep 2002 09:17:56 -0700

Malcolm,

That may have been lengthy, but, it wasn't much longer than the SRM section
in the Planning and Administration manual, and it was way more useful!

Thanks, Lonny

-----Original Message-----
From: Malcolm Beattie [mailto:[EMAIL PROTECTED]]
Sent: Saturday, September 21, 2002 10:44 AM
To: [EMAIL PROTECTED]
Subject: Re: z/VM 4.2 Dispatching Problem running about 120 Li nux copies

Sivey,Lonny writes:
> As a temporary measure I issued SET QUICKDSP ON for all the Linux images
> except 1.  This caused some immediate paging activity that has since
dropped
> back down to normal levels.  There are not currently any Linux systems
> waiting on the eligible queue.

Yep, that's what QUICKDSP does. Instead of going into one of the usual
queues--Q1, Q2 or Q3--they go into a special queue, Q0, from where they
get dispatched as soon as possible. We'll see that lower down and I'll
explain the figures below in gory detail both to give you some
confidence in the settings and to help show others what they mean.
I'll describe the situation as it was before you switched on QUICKDSP
for those Linux guests and hence bypassed the SRM constraints/safety
checks.

> q srm
> IABIAS : INTENSITY=90%; DURATION=2
> LDUBUF : Q1=100% Q2=75% Q3=60%

Those LDUBUF figures mean that guests in Q3 are only allowed to use
60% of your paging "capacity". Since Linux guests are always in Q3
(unless you've given them QUICKDSP in which case, as mentioned, they
avoid all these restrictions and go straight to Q0), you'll only ever
be able to use 60% paging "capacity" with your Linux guests. Unless
you're wanting to reserve 40% of your paging "capacity" for non-Linux
guests, you should raise the LDUBUF settings. An example would be
    SET SRM LDUBUF 100 100 100
which would mean all guests in any queue would all be able to
compete for 100% of your paging "capacity".

Now, why all those scare quotes around paging "capacity"? Because
LDUBUF doesn't measure the *amount* of page space, it measures how
much "load" a busily paging guest can put on your paging I/O
subsystem (the LDU in LDUBUF stands for "Loading User"). If CP
thinks a guest is going to be paging heavily, it expects it to
take up one paging "exposure" (i.e. saturate I/O to one of your
paging devices). The Q3=60% of your current LDUBUF setting means
that if all of your guests start to want to page heavily, then CP
would only allow up to 60% * 7 of your Q3 guests to do that. The 7
comes from the 7 exposures (subchannels for your paging devices)
that are listed from your "Q ALLOC PAGE" below. That means only 4
of your non-QUICKDSP Linux guests could go into heavy paging and
that numbers 5 and upwards would get kicked into their eligible list
(E3) sitting doing nothing. Unless you have other (non-Linux) guests
expecting to use those other paging exposures, you'd be wasting
resources.

Having said all of that, LDUBUF wasn't the constraint you were
hitting. Your guests couldn't even get to the stage where they
were wanting to page heavily because they were constrained by
STORBUF.

> STORBUF: Q1=125% Q2=105% Q3=95%

There you can see that all your Q3 guests (non-QUICKDSP Linux
yadda yadda) could only take up, between them, 95% of your real
storage before getting kicked into the E3 eligible list. In the
absence of the timer-on-demand kernel feature, a Linux guest
appears to CP to be actively using all of its memory. In other
words, CP calculates the guests "working set" (roughly speaking,
the minimal pages of memory that are necessary for the guest to
have available to continue doing useful work at a reasonable rate)
to be the whole of the memory allocated to the guest. That's almost
always a wildly pessimistic estimate for a Linux instance.

The STORBUF setting of Q3=95% means that CP won't risk taking memory
away from your Q3 Linux guests once the sum of the Q3 guests' working
sets exceeds 95% of your physical real storage (that's a slight lie,
ignore the man behind the curtain). *This* is the constraint you ran
into. Raising that value much higher (note that it can be higher than
100% in order to overcommit memory: remember that CP is summing up
working set sizes and that it's wildly overestimating the working set
sizes on non-timer-on-demand Linux guests) will remove this constraint.
So a setting of, for example
    set srm storbuf 300 250 200
will mean that CP will allow the (over)estimated working sets of your
Linux guests (Q3) to sum to twice the amount (200%) of real storage.
Provided the Linux guests' working sets are really *on average* only
half of the memory they've been given, the working sets will still fit
OK into real storage. CP will have to trim half their pages from them
and put them onto your page volumes (you discovered the same thing
happening when you gave them all QUICKDSP and made them bypass the
STORBUF constraint that way). CP may be a bit surprised to find it all
works OK (given its much higher estimate of the working sets) but we
can't really tell :-). However, it shouldn't have to make the guests
page *heavily* if their working set stays reasonable stable and the
guests don't start thrashing busily through memory randomly.

Now that you've raised that STORBUF constraint, what happens if the
Linux guests *do* start all making heavy, random use of all of their
memory all together all at the same time? CP is going to have to start
paging stuff both in and out in order to keep up? At what stage does
it decide that it's a bad idea to do all that I/O to page memory in
for a guest only to have to page it out again to make some available
for another guest a tiny instant later? And then do some more paging
soon after to get back the first guest's memory that it now needs
again? At some point, it may be better to kick out an entire guest
(or few) for a while so that the rest can get useful work done without
thrashing. Then, later on, it can kick out those guests and bring the
first lot back again. *That* is the job of the LDUBUF constraint that
we mentioned before. If you've raised STORBUF and CP has to start
paging *heavily* on behalf of the guests then it'll only allow a
certain number of Q3 guests to do that (the Q3 LDUBUF percentage of
your paging exposures, as we saw above) before kicking some guests
out to E3 to sit it out. So LDUBUF acts as a backstop for STORBUF once
the overcommit of real storage starts to pinch.

> DSPBUF : Q1=32767 Q2=32767 Q3=32767

The DSPBUF setting for a queue simply limits the number of guests
allowed to be in the associated dispatch queue. Having the DSPBUF
Q3=32767 means you can have 32767 Q3 guests all dispatched at the
same time before getting kicked out to E3. In other words, it's
effectively unlimited unless you want to play Test Plan Foo games.
If you set DSPBUF Q3 down to, say, 40 then when Linux guest number
41 started up (without timer-on-demand and without QUICKDSP), it
would get kicked straight into E3 and sit there like a lemon.
(This is what happened to us on the Large Scale Linux on VM
residency recently until I figured out it was the DSPBUF setting
that was too low.) I (personally) haven't come across any situation
involving Linux-only guests under VM in which setting the DSPBUF Q3
setting low enough to be relevant has ever been a useful constraint.
Obviously, this doesn't mean that I recommend setting it to a huge
number on every system and, as with any performance tuning, it needs
to be done carefully by someone who understands what is going on and
after getting full information on the specific system involved.
(OK, enough CYA.)

> q alloc page
>             EXTENT EXTENT  TOTAL  PAGES   HIGH    %
> VOLID  RDEV  START    END  PAGES IN USE   PAGE USED
> ------ ---- ------ ------ ------ ------ ------ ----
> 420RES 2271    194    277  15120  15120  15120 100%
>                639    688   9000   8997   9000  99%
> VMPASP 2281      0   1499 270000 133939 269977  49%
> VMPA01 227E      0   3338 601020 132574 285116  22%
> VMPA02 227F      0   3338 601020 136540 285106  22%
> VMPA03 2280      0   3338 601020 131633 285112  21%
> VMPA04 2282      0   3338 601020 131783 277199  21%
> VMPA05 2283      0   3338 601020 135393 284971  22%
>                           ------ ------        ----
> SUMMARY                    3299K 825979         25%
> USABLE                     3299K 825979         25%

As mentioned earlier, this shows me/you how many paging exposures
you have, so that you can work out when LDUBUF will start kicking in.
(It also shows you have a couple of paging extents on your sysres
volume which probably isn't really optimal. You have a good number
of other paging volumes and CP can do special optimisation tricks
for those (seldom ending channel progams, yadda yadda). It's unlikely
to be able to do those on your sysres pack which is a bit of a pity.)

> ind
> AVGPROC-035% 01
> MDC READS-000003/SEC WRITES-000000/SEC HIT RATIO-100%
> STORAGE-094% PAGING-0001/SEC STEAL-000%
> Q0-00126(00000)                           DORMANT-00012

That 126 for Q0 shows that you have 126 guests with QUICKDSP ON
so those guests are bypassing all those SRM constraints I mention
above and getting dispatched regardless.

> Q1-00000(00000)           E1-00000(00000)
> Q2-00000(00000) EXPAN-001 E2-00000(00000)
> Q3-00001(00000) EXPAN-002 E3-00000(00000)

That 1 for Q3 is the one Linux guest you did *not* give QUICKDSP.

If you wish, you may care to change the SRM settings (raise LDUBUF
and STORBUF as described above) and then remove QUICKDSP from the
Linux guests. You should find (if you get the STORBUF setting right)
that the Linux guests can survive without QUICKDSP. This would
have the advantage that CP can still have its SRM constraints as a
backstop in case the Linux guests start thrashing the system. If
that does happen then it may well be preferable to have some of
them go into eligible rather than have the system thrash without
getting any useful work done.

Hope this (long) explanation has been useful to some.

--Malcolm

--
Malcolm Beattie <[EMAIL PROTECTED]>
Linux Technical Consultant
IBM EMEA Enterprise Server Group...
...from home, speaking only for myself

Re: z/VM 4.2 Dispatching Problem running about 120 Li nux copies

Reply via email to