Tim,
As I said previously, you make some cogent and relevant points, but to the rest
of us (well, at least to me) one of the key problems in the OP's situation is
still the organizational one, which you still did not address.
The OP stated: "Our development management are telling is (Systems &
Operations) that . . .".
All of your plausible arguments that a hardware upgrade might (due to various
factors) in fact be cheaper for the company do not address this glaring issue.
Unless the OP is not telling us everything (which I grant is possible, maybe
even probable), it appears that a tail is trying to wag the dog.
As a long time application developer having experienced more than several
different management organizational paradigms, this one just strikes me as way
outside any box.
And you also did not completely address the OP's description of a truly
horrendous, bug-ridden, CICS-crashing initial implementation. There is NO
excuse (regulatory or otherwise) for an implementation of obviously untested
code. That certain kinds of bugs like the memory leak you mentioned can be
monitored and reacted to proactively while repair efforts continue in parallel,
I grant you. BUT you must first find those errors while actually testing, not
find them in production first. The kinds of errors the OP reported (". . .
multiple 0C4 and 0C7 - ASRA Abends, Storage Violations, and one CICS Task
abended in a loss of our main production CICS Region") are surely serious
enough for very senior management to be involved, and not just "development
management". I just cannot conceive of any business justification for pushing
untested code into production.
Your voice is rational but the initial situation the OP described must surely
be intolerable to any viable business.
But maybe my imagination is too limited.
Peter
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf
Of Timothy Sipples
Sent: Thursday, April 09, 2015 1:26 AM
To: [email protected]
Subject: Re: A New Performance Model
Shmuel Metz writes:
>IMHO, rolling them out with bugs is even more expensive, especially
>bugs that cause noncompliance with regulations. Testing encompasses
>much more than just performance.
I see we have another "Doctor No" in the audience. :-)
How do YOU know it's "even more expensive"? *Obviously* you don't for an
unspecified hypothetical.
You haven't articulated an inviolable physical constant, like some theories
in physics. This issue is situational, with really no absolutes. As a real
example, I worked with a customer that had a memory leak in their
application program (Java, as it happens), so eventually the heap would be
fully consumed and garbage collection couldn't free up space. They couldn't
quite stomp out the root cause(s) of the memory leak, but they faced a
government-mandated deadline to get their application into service. If they
failed to deliver they'd get fined or potentially even lose the whole
contract. So (upon my recommendation) they worked around that problem by
implementing: two production instances (probably a good idea anyway),
better monitoring (to improve diagnostics so they could track the memory
leak more closely and provide better feedback to development), fully
automated recycling of each production instance before the memory leak
could cause damage, and (optionally, but I think they did it) session
persistence so user state would be retained across instances, avoiding end
user disruptions. That multifaceted mitigation strategy tided them over
nicely until they could get the memory leak fixed, and they did
post-production in a future program update. That was the right decision for
the business, unequivocally. They were very happy, and so was the
government. Was Version 1.0 bug free? Was Version 1.0 as well optimized as
it could have been? No, and no.
As an aside, how many of you are still IPLing periodically due to a bug
corrected three decades ago? That's typically rigid nonsense, too, yet I'm
apparently surrounded by dogmatic operational nonsense. I feel
outnumbered. :-)
If I'm not responsible for making the final decisions, my fundamental
approach is to characterize for management, fairly and candidly, with no
b.s., the risks that I know about and, if possible, list out all the
options. Then let them call the *@#$ play. There are a few bright lines
that cannot be crossed, even by management, but "wait for 100% bug free,
fully performance optimized code" isn't actually one of those bright lines.
Otherwise nothing would ever ship, and "known issues" lists wouldn't even
exist. They do, of course.
One of the major value propositions of the mainframe ecosystem is that its
designers assumed correctly, long ago, that developers -- and operators,
for that matter -- are occasionally fallible, but business must still get
done and done well. Thus at every level in the platform architecture there
are rich, robust defenses against human error. That doesn't mean operators
trying to say "No" to everything "new" is helpful. Quite the opposite.
My views are my own, as always. Even when I seem to be the only rational
voice in the room. :-)
--------------------------------------------------------------------------------------------------------
Timothy Sipples
IT Architect Executive, Industry Solutions, IBM z Systems, AP/GCG/MEA
E-Mail: [email protected]
----------------------------------------------------------------------
This message and any attachments are intended only for the use of the addressee
and may contain information that is privileged and confidential. If the reader
of the message is not the intended recipient or an authorized representative of
the intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication
in error, please notify us immediately by e-mail and delete the message and any
attachments from your system.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN