On Thu, 28 Apr 2016 18:22:11 +0000, Tracy Adams <wrote: >We have a soft capped LPAR that runs our DB2 and CICS regions and during >the day some "marketing batch". On Wednesdays, the marketing batch (online >submit via CICS) increases and by afternoon we hit our 4 hour soft cap. Once >or twice while we are capped, the busiest CICS slow down to the point where >some old automation kicks in to kill transactions over 45 seconds old, some of >these transactions dump through DumpMaster, we then go to max sockets and >more transactions dump and in 10 - 30 seconds all is fine again. >What I see: The CICS regions have a DP around EC and are meeting their >service goal of 99% under .5 seconds. But there are tens of thousands >transactions that have led to this. The batch jobs (3-5 of them), while >running >10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service >class. I believe the problem lies with the DB2 service class. That has a >definition >of velocity at 66 and it tends to run below that when there is more >contention in >the system. The DP of the DB2 region is F6.
Are your CICS regions still meeting their goals when these anomalies occur? If your batch jobs are running Dicretionary at a DP lower than CICS, it is very unlikely that they are causing significant CICS delays. You say that DB2 sometimes fails to meet its goals when the system is loaded. That suggests to me that 66% isn't achievable, and it may be causing WLM to work extra hard to try to meet that goal. If the DB2 address spaces really are running at higher DP than CICS when the problems occur, then they are probably ok. Are your batch jobs using DB2 or other high priority address spaces? If your DB2 address goals are too aggressive, dropping the velocity from 66 to 60 won't make much difference. Have you read John Arwe's paper on velocity goals? I'm not a fan of percentile goals as high as 99%. It doesn't take many outliers to cause you to fail to meet your goal. Assuming that the vast majority of your transactions are quick, it won't matter whether your percentile is 99% or e.g. 80%. I like to set my percentile response times for the fastest transactions in each CICS address space and let the rest go along for the ride. That's not likely your problem though. How does your "old automation" determine that there is a problem? When one of these long running transactions are canceled, do you know what was going on in them? Are they just unusual transactions that take a long time, or are they in a loop or something? I wonder if the real problem is that this automation is canceling transactions that it shouldn't. -- Tom Marchant ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
