With "several hundreds if not thousands jobs running at any give time" you must be using an automated job scheduler of some sort. Does your scheduler maintain 'average run time' in its database? Does the scheduler itself, allow notification if that average time is exceeded?
If not - you can use MXG, or other means, to calculate statistics about run time. If you have to get notification in real time, you're going to need to code some stuff in exits. One approach (I haven't coded this so bear it in mind): Store the average time somewhere (MXG, wherever). For each job put that time into a comment card in the JCL. Use a fixed, easy to parse format. In exit IEFUJV, parse that comment card to make the average run time into a binary, add store it into a storage area you GETMAIN for each job (make sure to store the pointer into the user area for the job provided by the SMF user exit interface). Mark a flag in that area as 'no msg issued'. In exit IEFUJI, get the current time, add the average run time to it, and store the result in your job 'control block' In IEFU83 (shouldn't need IEFU84/85 for regular batch jobs, I wouldn't think?) get the current time, and compare it to what you stored. If it is larger than what you stored, issue a message via WTO and mark the flag as 'message issued'. This is to prevent issuing the message more than once for the job. IEFU83 gets called a lot since it is called for every SMF record created for a job so make sure your routine is coded efficiently. This is off the top of my head. I used IEFU83 to provide the ability to notify as soon as you exceeded what you expected; if you don't need to be that granular use IEFACTRT for step end, or IEFUSI at step initiation, and check between steps. The 'hard part' is getting the information into the system with the job. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
