Re: Long-running batch job (high CONN time)

Miklos Szigetvari Wed, 01 Oct 2008 08:01:34 -0700

Hi

Difficult to advice from here, in the past we had seen for some jobsmore or less the similar effect , find out that for some volumes theVTOC indexing was disabled


Johnny Luo wrote:

Hi,

I'm dealing with one production job whose elapsed-time has increased
dramatically in the past month. Since I'm doing this remotely and unable to
collect relevant data by myself, I must rely on the customer to do that for
me. It's not so convenient so I must do more 'theroritical' analysis. And I
don't have the luxury to use tools like STROBE.

Simply putting, the volume of input data to the job has not changed too much
according to the customer. However, the elapsed time has been increasing
over the month. The customer even did a test to run the same job with
the similar volume of  input data on a sandbox. The result is as follows: (
I only choose one step)

Production -
Clock: 58.8 (minutes)
TCB:  2.75
SRB: .21
EXCP: 982k
CONN: 899K

TEST System -
Clock: 10.3
TCB: 1.98
SRB: .08
EXCP: 910K
CONN: 282K


Obviously processor should not be the main impactor cause the step is not
cpu-intensive. For the same step, I believe EXCP count has some meaning: the
program did the similar amount of I/O on both production system and test
system.

Then, why CONN differs? (899K vs 282K)

From what I know, for the same amount of I/O CONN can differ if FICON is

used. FICON is using a 'switched transfer mechanism' and if too many users
are using the channel path, CONN time will increase to transfer the same
amount of data. (Another possiblility  is that too many I/O causes storage
subsystem to send back the data packed slowly thus increased CONN).

So at first glance, my conclusion is that the job is spending most of its
time doing I/O (high CONN). The amount of I/O is the same but system needs
more time to process it. That's the cause of elongation.

As for why system needs more time to process the same amount of I/O, I
believe the most possible reason is that there're othe i/o heavy jobs
running in the system at that time point.

Before digging deeply into the problem, I wanna make sure that the above
conclusion is not wrong.

I also tried RMF III and it shows device delay as the primary delay most of
time for the job. However, WFL for the job is good: above 80%.  So I don't
think the device delay will cause the job to run so slowly. Yes, sometimes
it has delay but most of time it gets what it wants.  High CONN does
not mean high delay.

Johnny

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


--
Miklos Szigetvari

Development Team

ISIS Information Systems Gmbhtel: (+43) 2236 27551 570Fax: (+43) 2236 21081E-mail: [EMAIL PROTECTED]Info: [EMAIL PROTECTED]Hotline: +43-2236-27551-111Visit our Website: http://www.isis-papyrus.com---------------------------------------------------------------

This e-mail is only intended for the recipient and not legally
binding. Unauthorised use, publication, reproduction or
disclosure of the content of this e-mail is not permitted.
This email has been checked for known viruses, but ISIS accepts
no responsibility for malicious or inappropriate content.

---------------------------------------------------------------

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Long-running batch job (high CONN time)

Reply via email to