"Rather than trying to infer the cause by tweaking everything in sight, how 
about this: Set a SLIP PER trap to catch an SVC dump for the first abend of any 
kind, S0C4 or otherwise. That dump may tell you the exact cause far more 
quickly than trial-and-error with recompiles/reassemblies/logic-changes."

Isn't there already a dump?

If there isn't, I absolutely agree. Todge about with things, and you change 
things.

Program A CALLs Program B. Program B goes into a Big Fat Loop.

LOOP.
do some stuff
IF loopcounter NOT GREATER THAN +5
    ADD 1 TO loopcounter
    GO TO LOOP.

Programmer "adds a DISPLAY" to B.
No loop.
Programmer removes added DISPLAY.
Loop.
Programmer adds DISPLAY to A.
No loop.
Programmer removes added DISPLAY.
Loop.
Programmer adds DISPLAY to both programs (yes, they did)
No loop.

Faffing-about continues for a couple of days, but the only time it loops is 
with the programs unchanged.

Comes to me with a big smile and "I've found a compiler bug ".

I looked at the above code, and ask what was the last code change he'd made 
before the first loop. He showed me in Program A. Utterly wild subscript, 
MOVEing LOW-VALUES to a one-byte field within a large group item subordinate to 
an OCCURS, conditionally.

Just happened to hit the low-order part of the binary one in the Literal Pool 
of Program B. So, adding the "constant" literal value one, was actually adding 
zero, so the loop could never terminate.

A handful of other scattered single-byte binary-zeros had had no obvious affect 
amongst all his "compiler bug" hunting.

But. Put in a DISPLAY, with a nice literal, prior to the use of +1, and the 
location of the +1 is changed (OS/VS COBOL doesn't define literals in reverse 
order), so now, for the loop, it is +1. Add a DISPLAY to Program A, and the 
Program B code "moves", and again the +1 in the literal pool is preserved.

Cancel, with a dump. Look at the dump. Look at the code for the loop, and 
there's X'0000' where X'0001' should be in the Literal Pool of Program B.

If ever overwiting of executable code is suspected, adding code in front of the 
problem is going to shift it at best, and mask it at worse.

Bust that dump. Guessing and DISPLAY I find of little value. Obtaining all the 
information possible, and going forward with what matches all the information 
(cue various Sherlock Holmes quotes) I find of great value. A dump (and for 
sure a full one) mostly has all the information needed.

Having said that, something that happens with record 17, which causes a failure 
when record 291,347 is reached, is more problematic - except that deduction 
leads you there, and at times surprisingly fast.

The information available so far does not indicate that this can be an 
addressing problem due to 31/24. The AMBLIST Module Summaries are the same. 

OK, if Peter Ten Eyck remains quiet on the subject from now on, there's that 
outside possibility that the same report was compared to itself, and the actual 
new report reveals exactly what people have been suggesting is possible :-)

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to