Hi Gary,
Since no one has answered your question, I'll attempt to do so.
First, please review pages 69-92 of SG24-5952 (Redbook: z/OS Intelligent
Resource Director). I believe that if you spend some time with these
pages, you will better appreciate the IRD algorithms. Particularly, study
pages 83-92, as these pages describe the WLM Vary CPU Management Logic.
Second, without seeing your actual data but using only the information that
you provided, it is likely that IRD is working as designed. Your important
work is meeting goals and IRD probably concluded that removing a logical
processor from your "bigger LPAR" would have the potential of causing
important work to miss goals. As the algorithms described in the
referenced pages show, WLM is reluctant to vary off CPUs unless there is a
clear advantage in doing so without harming important work. By the time
WLM gets to Importance 4 work in your environment, WLM probably concludes
that the risk of adjusting logical processors was not worth the potential
harm to higher importance work. From this perspective, WLM (as normal) is
biased toward getting the most work through the system while meeting goals
for important work.
Third, I belive that you might not appreciate the "short engine"
effect. Certainly, your statement that "MVS busy vs LPAR busy was over 30%
different" would not normally indicate a "short engine" effect. There
might be pathological cases in which this effect would be experienced with
such a relatively low delay to logical processors caused by their waiting
on the Logical Processor Ready Queue, but I've not seen any. For that
matter, CPExpert doesn't even begin to consider that a "short engine"
effect *might* exist until the ratio of MVS busy vs LPAR busy exceeds 2:1,
and this "low" ratio is used only to educate folks to the analysis and
implications of the "short engine" effect (that is, it encourages users to
read my documentation).
The "MVS busy vs LPAR busy was over 30% different" simply means that there
was a queue of waiting logical processors. Such a queue would delay work
being processed, but is relatively equivalent to having address spaces "in
and ready" from the view that it is only delaying overall work. Indeed,
with a 1.4:1 logical to physical processor ratio, and with both LPARs near
100% busy, you would expect such a ratio of around 1.4:1 (or 1.3:1 in your
example).
In her paper "Optimal Performance When Running CICS in a Shared LPAR
Environment" presented at SHARE Session 1066, August 2004, Kathy Walsh (IBM
Washington System Center) described the symptoms of the "short engine"
effect. Kathy also provided an illustration of the "short engine' effect,
which showed a "short CP ratio" of more than 4:1 caused by very low
physical processor share with an IMS Data Sharing application. In her
example, the number of logical processors was reduced from 6 logical
processors to 2 logical processors, with a resulting decrease in IMS
response time of from over 2 seconds response time to less than 0.6 seconds
response time. But notice that her example used 4:1 ratio as an indicator
of the "short engine" effect (rather than the 1.3:1 that you believe
indicates a "short engine" effect).
The "short engine" effect mostly apples with single TCB subsystems (such as
many CICS environments), when LPARs are near 100% busy and weights are
being enforced, and there is a low physical processor share per logical
processor.
Page 62 of the referenced Redbook provides some discussion of this topic,
with the conclusion that "Is it better to have a CPC with a large number of
slower CPs, or one with a small number of fast ones?" The answer to this
question depends on your specific workload and objectives.
Hope this helps,
Don
******
Don Deese, Computer Management Sciences, Inc.
Voice: (703) 922-7027 Fax: (703) 922-7305
http://www.cpexpert.org
******
At 04:36 PM 6/6/2005, you wrote:
All,
Has anyone besides me noticed IRD allow short on engines in an LPAR
cluster with Weight Management and CPU Vary enabled? We're at 1.4, and
as far as I know we've got all the updates for IRD.
I've had an experience on a large system (>10 PCPs) with 2 LPARS and the
CEC running at 100%, where every available LCPU was online to both
LPARS, min weights were set to 1 and max weights left blank. Importance
1 work was fine. Importance 2 work was hovering around PI 1.0. On the
"bigger LPAR" (70% of CEC) importance 3 work was running around PI 4.0,
and on the "smaller LPAR" importance 3 work was running in the mid 200's
for PI. IRD left ALL LCPUs online to the big LPAR, took some off the
smaller one, but still left us with a ratio of 1.45 LCPU to 1 PCP. As
importance 4 and 5 work on the smaller LPAR didn't get any time, the in
and ready queue stacked to the sky, going as high as 20 times the number
of physical CPUs it was dispatching. Taking some LCPUs off of the
larger LPAR allowed the in and ready queue on PAR2 to get caught up in
short order. IRD didn't take LCPUs off of the big LPAR to reduce the
short engine effect (MVS busy vs LPAR busy was over 30% different), and
it's puzzling.
Isn't IRD supposed to work to avoid the short engine effect?
Other than this one time, IRD's been working very well. I'll admit,
however, I don't know why IRD prefers to keep all available LCPUs online
when workloads are low - isn't the dispatching algorithm expensive
enough that this is a bad idea (book says: to maximize multiprocessing)?
Am I the only one who has seen this, and scratched their head and
wondered why?
Thanks and best regards,
Gary Diehl
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.6.6 - Release Date: 6/8/2005
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html