Re: Long-running batch job (high CONN time)

Hal Merritt Wed, 01 Oct 2008 08:48:50 -0700

Not a lot to go on. For example, we don't even know how many files are
involved. Assuming only one, then how is it being accessed? Sequential?
VSAM sequential? VSAM random? Reading? Writing? Some of both? Each of
those would suggest a different attack vector.


The numbers suggest a 'knee of the curve' phenomena where only a slight
increase in load will result in a huge jump in clock time. This can also
be describe as a resource saturation event.     

I'd shy away from system issues at first and focus on the most common
application issues. 

If this is a plain old QSAM, then I'd make sure the block size is maxed
(half track) and ample (hundreds) buffers are specified.  

Tuning VSAM is more complex, but the attack is normally buffers and
buffer management strategy (LSR, for example).  

Lastly, a tiny (perhaps hard to observe) increase in I/O times can and
will make huge differences in clock time.   



-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[EMAIL PROTECTED] On
Behalf Of Johnny Luo
Sent: Wednesday, October 01, 2008 9:26 AM
To: [email protected]
Subject: Long-running batch job (high CONN time)

Hi,

I'm dealing with one production job whose elapsed-time has increased
dramatically in the past month. Since I'm doing this remotely and unable
to
collect relevant data by myself, I must rely on the customer to do that
for
me. It's not so convenient so I must do more 'theroritical' analysis.
And I
don't have the luxury to use tools like STROBE.

Simply putting, the volume of input data to the job has not changed too
much
according to the customer. However, the elapsed time has been increasing
over the month. The customer even did a test to run the same job with
the similar volume of  input data on a sandbox. The result is as
follows: (
I only choose one step)

Production -
Clock: 58.8 (minutes)
TCB:  2.75
SRB: .21
EXCP: 982k
CONN: 899K

TEST System -
Clock: 10.3
TCB: 1.98
SRB: .08
EXCP: 910K
CONN: 282K


Obviously processor should not be the main impactor cause the step is
not
cpu-intensive. For the same step, I believe EXCP count has some meaning:
the
program did the similar amount of I/O on both production system and test
system.

Then, why CONN differs? (899K vs 282K)

>From what I know, for the same amount of I/O CONN can differ if FICON is
used. FICON is using a 'switched transfer mechanism' and if too many
users
are using the channel path, CONN time will increase to transfer the same
amount of data. (Another possiblility  is that too many I/O causes
storage
subsystem to send back the data packed slowly thus increased CONN).

So at first glance, my conclusion is that the job is spending most of
its
time doing I/O (high CONN). The amount of I/O is the same but system
needs
more time to process it. That's the cause of elongation.

As for why system needs more time to process the same amount of I/O, I
believe the most possible reason is that there're othe i/o heavy jobs
running in the system at that time point.

Before digging deeply into the problem, I wanna make sure that the above
conclusion is not wrong.

I also tried RMF III and it shows device delay as the primary delay most
of
time for the job. However, WFL for the job is good: above 80%.  So I
don't
think the device delay will cause the job to run so slowly. Yes,
sometimes
it has delay but most of time it gets what it wants.  High CONN does
not mean high delay.

 Johnny

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

NOTICE: This electronic mail message and any files transmitted with it are 
intended
exclusively for the individual or entity to which it is addressed. The message, 
together with any attachment, may contain confidential and/or privileged 
information.
Any unauthorized review, use, printing, saving, copying, disclosure or 
distribution 
is strictly prohibited. If you have received this message in error, please 
immediately advise the sender by reply email and delete all copies.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Long-running batch job (high CONN time)

Reply via email to