On Thu, 6 Apr 2017 12:08:25 +0000, Vernooij, Kees (ITOPT1) - KLM 
<[email protected]> wrote:

>Hello,
>
>We sometimes experience long delays in job initiation, which I cannot explain. 
>It concerns a WLM managed jobclass, which is filled by Control-M. Every now 
>and then, we see e.g. 4 jobs running and an accumulating number of jobs in the 
>input queue, up to more than 100, waiting for one or more hours. When one job 
>ends, the next job starts, but the number of executing jobs remains 4.
>
>I have ruled out all obvious causes, such as a heavy loaded system, jobs not 
>eligible to run on that system etc. etc.
>
>From the Redbook "System Programmer's Guide to: Workload Manager" I found that 
>JES2 is only following WLM, it will start a job when WLM has started an 
>Initiator ("If there are no free initiators, jobs run
>wherever another job finishes, or WLM starts new initiators.").  The 
>$DSRVCLASS,LONG displays the number of initiators WLM has started for JES2 to 
>use.
>
>So the number of running job is fully determined by WLM and I am trying to 
>find out why WLM does not start more initiators. The first period of the jobs 
>has a Response Goal of 30 seconds. Since this includes a job's Input Queue 
>time and jobs were waiting for several hours in the input queue, this only 
>should have been a reason to start extra Initiators.
>
>I have produced SMF record 99 subtype 6 and it displays a lot of information 
>about the status of the Service Classes, like MPL-IN-TARGET and 
>MPL-OUT-TARGET, but I have the feeling that this applies to swapping IN and 
>OUT of already running tasks and does not say anything about jobs in the Input 
>Queue.
>
>My question is: which metrics can tell me more about WLM's decisions to start 
>Initiators, not start them or stop them?
>
>Thanks in advance.
>Kees.
>

If this is z/OS 2.2, there is a bug with JES2 keeping track of the number of 
WLM INITs by 
serviceclass.  I saw it at one of my clients and the result was way too much 
work getting
routed to one of the LPARs even when that LPAR was running at or close to 100% 
and
that led to some batch delays.  For example, $DSRVCLASS,LONG showed 150 INITs
for one particular serviceclass while and SDSF "INIT WLM" command showed about 
20.

See if you have APAR OA51343 and prereq's installed.  
http://www-01.ibm.com/support/docview.wss?crawler=1&uid=isg1OA51343

The APAR talks about a zero or negative count, but we saw a high count.  It also
applies to z/OS 2.1 but we never say any problem on z/OS 2.1.  We started seeing
the problem when z/OS 2.2 was being rolled out to the first wave of LPARs in 
a large sysplex.  


Regards,

Mark
--
Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS
ITIL v3 Foundation Certified
mailto:[email protected]
Mark's MVS Utilities: http://www.mzelden.com/mvsutil.html
Systems Programming expert at http://search390.techtarget.com/ateExperts/
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to