Re: IRD & Short Engine Effect at 100% CEC utilization

Don Deese Thu, 09 Jun 2005 09:06:12 -0700

Hi Gary,

Since no one has answered your question, I'll attempt to do so.

First, please review pages 69-92 of SG24-5952 (Redbook: z/OS IntelligentResource Director). I believe that if you spend some time with thesepages, you will better appreciate the IRD algorithms. Particularly, studypages 83-92, as these pages describe the WLM Vary CPU Management Logic.

Second, without seeing your actual data but using only the information thatyou provided, it is likely that IRD is working as designed. Your importantwork is meeting goals and IRD probably concluded that removing a logicalprocessor from your "bigger LPAR" would have the potential of causingimportant work to miss goals. As the algorithms described in thereferenced pages show, WLM is reluctant to vary off CPUs unless there is aclear advantage in doing so without harming important work. By the timeWLM gets to Importance 4 work in your environment, WLM probably concludesthat the risk of adjusting logical processors was not worth the potentialharm to higher importance work. From this perspective, WLM (as normal) isbiased toward getting the most work through the system while meeting goalsfor important work.

Third, I belive that you might not appreciate the "short engine"effect. Certainly, your statement that "MVS busy vs LPAR busy was over 30%different" would not normally indicate a "short engine" effect. Theremight be pathological cases in which this effect would be experienced withsuch a relatively low delay to logical processors caused by their waitingon the Logical Processor Ready Queue, but I've not seen any. For thatmatter, CPExpert doesn't even begin to consider that a "short engine"effect *might* exist until the ratio of MVS busy vs LPAR busy exceeds 2:1,and this "low" ratio is used only to educate folks to the analysis andimplications of the "short engine" effect (that is, it encourages users toread my documentation).

The "MVS busy vs LPAR busy was over 30% different" simply means that therewas a queue of waiting logical processors. Such a queue would delay workbeing processed, but is relatively equivalent to having address spaces "inand ready" from the view that it is only delaying overall work. Indeed,with a 1.4:1 logical to physical processor ratio, and with both LPARs near100% busy, you would expect such a ratio of around 1.4:1 (or 1.3:1 in yourexample).


In her paper "Optimal Performance When Running CICS in a Shared LPAR

Environment" presented at SHARE Session 1066, August 2004, Kathy Walsh (IBMWashington System Center) described the symptoms of the "short engine"effect. Kathy also provided an illustration of the "short engine' effect,which showed a "short CP ratio" of more than 4:1 caused by very lowphysical processor share with an IMS Data Sharing application. In herexample, the number of logical processors was reduced from 6 logicalprocessors to 2 logical processors, with a resulting decrease in IMSresponse time of from over 2 seconds response time to less than 0.6 secondsresponse time. But notice that her example used 4:1 ratio as an indicatorof the "short engine" effect (rather than the 1.3:1 that you believeindicates a "short engine" effect).

The "short engine" effect mostly apples with single TCB subsystems (such asmany CICS environments), when LPARs are near 100% busy and weights arebeing enforced, and there is a low physical processor share per logicalprocessor.

Page 62 of the referenced Redbook provides some discussion of this topic,with the conclusion that "Is it better to have a CPC with a large number ofslower CPs, or one with a small number of fast ones?" The answer to thisquestion depends on your specific workload and objectives.


Hope this helps,

Don

******
Don Deese, Computer Management Sciences, Inc.
Voice: (703) 922-7027  Fax: (703) 922-7305
http://www.cpexpert.org
******




At 04:36 PM 6/6/2005, you wrote:

All,

Has anyone besides me noticed IRD allow short on engines in an LPAR
cluster with Weight Management and CPU Vary enabled?  We're at 1.4, and
as far as I know we've got all the updates for IRD.

I've had an experience on a large system (>10 PCPs) with 2 LPARS and the
CEC running at 100%, where every available LCPU was online to both
LPARS, min weights were set to 1 and max weights left blank.  Importance
1 work was fine.  Importance 2 work was hovering around PI 1.0.  On the
"bigger LPAR" (70% of CEC) importance 3 work was running around PI 4.0,
and on the "smaller LPAR" importance 3 work was running in the mid 200's
for PI.  IRD left ALL LCPUs online to the big LPAR, took some off the
smaller one, but still left us with a ratio of 1.45 LCPU to 1 PCP.  As
importance 4 and 5 work on the smaller LPAR didn't get any time, the in
and ready queue stacked to the sky, going as high as 20 times the number
of physical CPUs it was dispatching.  Taking some LCPUs off of the
larger LPAR allowed the in and ready queue on PAR2 to get caught up in
short order.  IRD didn't take LCPUs off of the big LPAR to reduce the
short engine effect (MVS busy vs LPAR busy was over 30% different), and
it's puzzling.

Isn't IRD supposed to work to avoid the short engine effect?

Other than this one time, IRD's been working very well.  I'll admit,
however, I don't know why IRD prefers to keep all available LCPUs online
when workloads are low - isn't the dispatching algorithm expensive
enough that this is a bad idea (book says: to maximize multiprocessing)?

Am I the only one who has seen this, and scratched their head and
wondered why?

Thanks and best regards,

Gary Diehl




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.6.6 - Release Date: 6/8/2005

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: IRD & Short Engine Effect at 100% CEC utilization

Reply via email to