Re: [z/OS v1.7] WLM Performance periods: durations vs. service units

Don Deese Thu, 29 Mar 2007 13:05:46 -0800

Hey, wait a minute guys.

I think that there is a bit of confusion on DUR and how TSOtransactions transit to Period 2.

1. Duration is the amount of service that a period should consumebefore going on the next period. This is NOT service units persecond, but is total service units consumed. Thus, your 750 serviceunits do not equate to clock seconds in any regard. The 750 serviceunits are composed of CPU (SRB and TCB) service units, plus I/Oservice units, plus (potentially) MSO service units. These basiccategories of service are adjusted by the service coefficients (CPU,IOC, MSO, SRB). Those resulting service unit measures are basicallyunrelated to elapsed clock time.

2. There is not a direct relationship between service units consumedand elapsed time of the transaction (consider a CPU burner versussomeone scrolling a PDS). The RMF 14 buckets are buckets of responsetimes. They do not represent service units consumed. You cannotlegitimately say that the transactions ending in bucket 14 consumeany more service than a transaction ending in any other bucket. Allyou can say is that if they ended in Period 1, then they probablyconsumed less than 750 service units in your case (actually, 750 plusthe amount that transactions consumed before the SRM noticed andbooted them into Period 2...which can be a huge amount of serviceunits for CPU burners). For that matter, the delays to TSOtransactions often are more a function of other workloads (especiallyworkloads running at a higher Goal Importance) than anything inherentin the TSO transactions themselves.

3. Depending on how you have set RMPTTOM, you might find thatsignificantly more service units were consumed in TSO Period 1 thanyou might have specified. This is because setting RMPTTOM to largevalues means that the SRM will check less frequently to see whether aDUR value was exceeded. In data sent to me by some CPExpert users,I see the AVERAGE service units consumed per transaction in TSOPeriod 1 to be several times higher than the DUR value for TSO Period 1!

4. The design of multiple service class periods mostly focused onservice consumption. The idea is that heavy users of service shouldnot be in a position to interfere unreasonably with low users ofservice. If the heavy users of service get migrated to Period 2, theresult is that the low users of service would not be unreasonablydelayed in their response time. From a practical view, the conceptof service mostly revolves around CPU service.

To a large extent, this idea is a carry-over from pre-SP5.2 days,when a dispatchable unit of work could monopolize the dispatchingqueue ahead of other work at the same dispatching priority. Sincethe "fair access" algorithm introduced with SP5.2 eliminated thisdispatching problem, a lot of the technical need for multiple periodswent away. Only in the case of serious resource consumption by lotsof dispatchable units executing concurrently in Period 1 should thisbecome a problem.

5. In many cases, you will not see any better or worse response totrivial transactions in TSO Period 1 by introducing a TSO Period 2(there are exceptions, of course). Mostly, TSO transactions shouldmigrate to TSO Period 2 based on management decisions rather thantechnical decisions (for example, "get those heavy resourcetransactions into TSO Period 2 Importance 3 or 4, where they willcompete with other resource consumers at Importance 3 or 4, and thecompetition at lower Importance will discourage users from submittingthat type of transaction under TSO", or some such management scheme).

6. The percent of transactions that end in TSO Period 1 is not auniversal objective. There is nothing whatsoever "magic" about 75%ending in Period 1. The percent ending in Period 1 should be afunction of your management objectives versus the resources consumedby various kinds of transactions (that is, 90% or 95% or 100% endingin Period 1 can be a valid objective in the rightenvironment). Other than management objectives, the overridingtechnical concern should be how many transactions executeconcurrently in Period 1 (and thus have the potential for interferingwith each other for access to a CPU). In an LPAR with multiplelogical processors, this potential for interference decreasessubstantially (think queuing model effects).


Regards,

Don

******
Don Deese, Computer Management Sciences, Inc.
Voice: (703) 922-7027  Fax: (703) 922-7305
http://www.cpexpert.org
******



At 07:11 PM 3/28/2007, you wrote:

I'm reviewing our Workload Manager policy which hasn't really changed
since we implemented Goal Mode with OS/390 and an S/390 2003-237.
Granted, we haven't had many *real* problems over the years but ...

I'm trying to confirm my (lack of) understanding regarding the
duration value for a Performance period and the Service Units/second
I find in the RMF WLM report.  From what I read in the Planning: WLM
manual, "Duration: Specifies the length of the period in service
units."  Does that imply that the 750 specified for Period 1 duration
equated to approx. 0.44 clock seconds with the 2003 (1724.7 SU/sec);
exclusive of wait times, natch.  And, by extension, does that mean
the period is now down to 0.11 clock seconds on our latest z/890
(8084 SU/sec)?  The manual is not helpful neither can I find anything
in the Systems Programmer's redbook.  Anybody with a better
reference/guide they can point me too?

My concerns centre 'round two cases in our local environment.  1) TSO
period 1 is 75% completion in 0.5 sec with a duration of 750 SU and
period 2 has Velocity>15.  I'm worried that 750 SU is not really long
enough for half-a-second duration ie. that tasks are dropping
(almost) straight away into period 2.  I'll be researching that in
the RMF report(s) later.

2) I'm looking to split a service class into two periods because it's
exhibiting the classic 'valley' graph of response times ie. 90% of
transactions are roughly split between bottom (0.5) and top (>4.0)
buckets.  Its current definition is 50% complete in 1 second and I've
got *one* 4hr sample of 65% at 0.5 sec and 27% at >4 sec.  It was
suggested, during a course, that this implies multiple periods.  My
direct problem is how to determine the duration value?


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: [z/OS v1.7] WLM Performance periods: durations vs. service units

Reply via email to