Re: WLM issue with a proposed solution

Tom Marchant Thu, 28 Apr 2016 17:58:30 -0700

On Thu, 28 Apr 2016 18:22:11 +0000, Tracy Adams <wrote:

>We have a soft capped LPAR that runs our DB2 and CICS regions and during 
>the day some "marketing batch".  On Wednesdays, the marketing batch (online 
>submit via CICS) increases and by afternoon we hit our 4 hour soft cap.  Once 
>or twice while we are capped, the busiest CICS slow down to the point where 
>some old automation kicks in to kill transactions over 45 seconds old, some of 
>these transactions dump through DumpMaster, we then go to max sockets and 
>more transactions dump and in 10 - 30 seconds all is fine again. 
 
>What I see: The CICS regions have a DP around EC and are meeting their 
>service goal of 99% under .5 seconds.  But there are tens of thousands 
>transactions that have led to this.  The batch jobs (3-5 of them), while 
>running 
>10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service 
>class.  I believe the problem lies with the DB2 service class.  That has a 
>definition 
>of velocity at 66  and it tends to run below that when there is more 
>contention in 
>the system.  The DP of the DB2 region is F6.


Are your CICS regions still meeting their goals when these anomalies occur?

If your batch jobs are running Dicretionary at a DP lower than CICS, it is very 
unlikely that they are causing significant CICS delays.

You say that DB2 sometimes fails to meet its goals when the system is loaded. 
That suggests to me that 66% isn't achievable, and it may be causing WLM to 
work extra hard to try to meet that goal. If the DB2 address spaces really are 
running at higher DP than CICS when the problems occur, then they are 
probably ok.

Are your batch jobs using DB2 or other high priority address spaces?

If your DB2 address goals are too aggressive, dropping the velocity from 66 to 
60 
won't make much difference. Have you read John Arwe's paper on velocity goals?

I'm not a fan of percentile goals as high as 99%. It doesn't take many outliers 
to 
cause you to fail to meet your goal. Assuming that the vast majority of your 
transactions are quick, it won't matter whether your percentile is 99% or e.g. 
80%. 
I like to set my percentile response times for the fastest transactions in each 
CICS 
address space and let the rest go along for the ride. That's not likely your 
problem 
though.

How does your "old automation" determine that there is a problem?

When one of these long running transactions are canceled, do you know what was 
going on in them? Are they just unusual transactions that take a long time, or 
are 
they in a loop or something?

I wonder if the real problem is that this automation is canceling transactions 
that it 
shouldn't.

-- 
Tom Marchant

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: WLM issue with a proposed solution

Reply via email to