One more recovery war story. So old that I cannot find a reference in Google. 
JES2 followed a long tradition when they added support in the 1980s for the 
8100 high speed laser printer. While older clackety-clack devices were fairly 
easy to handle, the 8100 was so complex that JES2 recovery code got more and 
more convoluted. At one point, developers decided that chuck-it-and-start-over 
was a more effective short-term strategy than trying to handle any of an 
increasing number of anomalies. The easiest way achieve a complete reset was a 
'controlled abend' in JES2 itself. Recovery code was that good.

So a 'fix' was released where at various points of unresolvable 8100 conundrum, 
a branch would be deliberately taken to a known abend location. This location 
would contain unexecutable code that would serve as an eyecatcher in S0C1 
analysis. Existing recovery routines were so good that the environment would be 
cleaned up and JES2 execution would continue after displaying some diagnostic 
printer information. Problem: the 'unexecutable code' was an EBCDIC string that 
looked to the CPU like a packed decimal instruction. So instead of the expected 
S0C1, JES2 took a S0C7 that the recovery routine thought was a true code 
failure. JES2 would go down. It didn't take long for an APAR fix that inserted 
binary zeroes in front of the EBCDIC string, but it was all pretty 
embarrassing. Or so I was told.   

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler 
SHARE MVS Program Co-Manager
626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Leonardo Vaz
Sent: Monday, September 21, 2015 7:38 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recovery routine for ICHRTX00

Thank you all very much for your inputs! I keep finding assembler more 
interesting the more I play with it.

The main reason I wanted a recovery routine for this exit is, as Walt 
described, any abends on the routine won't disable it (there is no simple way 
to disable it, other than "zap" an IEFBR14 on top of it) and it directly 
affects the issuer of the RACROUTE macro, so a simple recovery routine to 
terminate processing, returning with zeroes on R15 would pass control to the 
ESS.

It is very interesting that you lean against recovery routines on system exits, 
I was wanting to add some to our JES2 type 5 exits to prevent JES2 termination 
from an abend on a custom display commands, but I guess properly testing should 
prevent abends anyway; just something I wouldn't get from manuals, thanks for 
the hint.

I am trying to use this exit with REQUEST=FASTAUTH,CLASS=PROGRAM, which means a 
local lock may be held and I may be in problem state/key8 so I don't think I 
can really set a recovery routine in that state anyway, can I?

Having to IPL to implement/fallback on the routine means I will never really 
want to put it in a production environment.

Thanks again and best regards,
Leo

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of J O Skip Robinson
Sent: Saturday, September 19, 2015 5:10 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recovery routine for ICHRTX00

My view of recovery routines may have been jaded by an experience early in my 
career. We sysprogs were implored to debug an application that was taking truly 
bizarre abends. One repeatable abend was in a DFSORT routine. Another was in a 
printer error handling routine. Trouble was, it was a straight COBOL program 
that was neither sorting nor printing. 

After spending hours on a Saturday morning with various sysprog tricks and 
tools, the problem boiled down to this. At initialization, the application 
called a years-old recovery setup program that no one knew anything about. 
Somewhere along the line, the COBOL program took a S0C7 as application programs 
are wont to do. The recovery routine got control and screwed up registers, 
which led to a wild branch that seemed most often to land in the middle of 
either DFSORT or hardware error correction modules. That inevitably led to a 
head-scratching S0C4. 

The solution was to replace the old recovery setup program with an IEFBR14. I 
have not been a fan of recovery routines ever since.  

.
.
.
J.O.Skip Robinson
Southern California Edison Company
Electric Dragon Team Paddler
SHARE MVS Program Co-Manager
626-302-7535 Office
323-715-0595 Mobile
jo.skip.robin...@sce.com

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Peter Relson
Sent: Saturday, September 19, 2015 6:07 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Recovery routine for ICHRTX00

>I lean against recovery actions for system exits

I don't specifically disagree with this. 

Three main reasons to have recovery come to mind
-- to release resources that you have obtained
-- to gather diagnostic data to help debug your error
-- to protect your caller from unexpectedly getting control in its recovery

The first is critical. But most exits do not obtain any resources (especially 
if they can be given dynamic storage to work with).
The second is, to a customer, "up to you".
The importance of third depends on the code calling the exit.

Peter Relson
z/OS Core Technology Design

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to