"Rather than trying to infer the cause by tweaking everything in sight, how
about this: Set a SLIP PER trap to catch an SVC dump for the first abend of any
kind, S0C4 or otherwise. That dump may tell you the exact cause far more
quickly than trial-and-error with recompiles/reassemblies/logic-changes."
Isn't there already a dump?
If there isn't, I absolutely agree. Todge about with things, and you change
things.
Program A CALLs Program B. Program B goes into a Big Fat Loop.
LOOP.
do some stuff
IF loopcounter NOT GREATER THAN +5
ADD 1 TO loopcounter
GO TO LOOP.
Programmer "adds a DISPLAY" to B.
No loop.
Programmer removes added DISPLAY.
Loop.
Programmer adds DISPLAY to A.
No loop.
Programmer removes added DISPLAY.
Loop.
Programmer adds DISPLAY to both programs (yes, they did)
No loop.
Faffing-about continues for a couple of days, but the only time it loops is
with the programs unchanged.
Comes to me with a big smile and "I've found a compiler bug ".
I looked at the above code, and ask what was the last code change he'd made
before the first loop. He showed me in Program A. Utterly wild subscript,
MOVEing LOW-VALUES to a one-byte field within a large group item subordinate to
an OCCURS, conditionally.
Just happened to hit the low-order part of the binary one in the Literal Pool
of Program B. So, adding the "constant" literal value one, was actually adding
zero, so the loop could never terminate.
A handful of other scattered single-byte binary-zeros had had no obvious affect
amongst all his "compiler bug" hunting.
But. Put in a DISPLAY, with a nice literal, prior to the use of +1, and the
location of the +1 is changed (OS/VS COBOL doesn't define literals in reverse
order), so now, for the loop, it is +1. Add a DISPLAY to Program A, and the
Program B code "moves", and again the +1 in the literal pool is preserved.
Cancel, with a dump. Look at the dump. Look at the code for the loop, and
there's X'0000' where X'0001' should be in the Literal Pool of Program B.
If ever overwiting of executable code is suspected, adding code in front of the
problem is going to shift it at best, and mask it at worse.
Bust that dump. Guessing and DISPLAY I find of little value. Obtaining all the
information possible, and going forward with what matches all the information
(cue various Sherlock Holmes quotes) I find of great value. A dump (and for
sure a full one) mostly has all the information needed.
Having said that, something that happens with record 17, which causes a failure
when record 291,347 is reached, is more problematic - except that deduction
leads you there, and at times surprisingly fast.
The information available so far does not indicate that this can be an
addressing problem due to 31/24. The AMBLIST Module Summaries are the same.
OK, if Peter Ten Eyck remains quiet on the subject from now on, there's that
outside possibility that the same report was compared to itself, and the actual
new report reveals exactly what people have been suggesting is possible :-)
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN