Hi All,

I have a couple of points to make on this topic:

1)  For all the well documented (and well taken) information about the
importance of shutting down a system orderly and cleanly, I find it
hard to remember when - in my experience - the system ever had
problems coming back up after a hard crash (I worked in OPS for 10+
yrs.).  Maybe back in the 308x days, but 3090+ ??  The hardware was
pretty resilient, as I remember.  I'm not saying that I recommend
anything other than a clean shutdown, but....

2)  Kelly's post harken's me back to an old pet peev and that is;
Operations *used* to have good, knowledgable people who could make
decisions without calling 5 people to tell them what to do!  I saw,
firsthand, the dumbing down of OPS and it disturbed me greatly.  I had
mgmt come into Operations where I worked that *never* wanted to be at
fault... that was truly their #1 priority.  They achieved this through
never making a darn decision on their own... never sticking their neck
out no matter what the situation.  I remember one time where I
restarted the master catalog to resolve a problem; as called for by
the manual (ok... I think I could have gotten away with a lesser
evil), but my point is that my mgmt thought I was nuts (and just
lucky).  Maybe so, but as long as we put zombies who won't take action
based on knowledge and experience (and who are - most importantly -
empowered to do so), then more money must be spent on hardware,
systems and automation that will take the place of that.

Just my thoughts...

All the best,
Scott T. Harder

> Kelly Bert Manning wrote:
>> Please don't laugh.
>>
>> I work with applications on a non-sysplex and non-xrf, supported, z/OS
>> where there have been 3 cases of UPS batteries draining flat,
>> followed by uncontrolled server crashes, in the past 17 years.
>>
>> They all happened in October and November, gale season (Cue background
>> music with the "Gales of November" line by Gordon Lightfoot)
>>
>> After the first one the data center operator said that they would consider
>> giving operators authority to shut down OS/390 if they were unable to
>> make immediate contact with the "Duty Manager" after discovering that
>> UPS batteries were draining during a power failure and that generator
>> power was not available or failed after starting.
>>
>> Four weeks later a carbon copy crash occurred, inspriring a promise that
>> operators would start draining CICS and IMS message queues and stopping
>> and rolling back BMPs and DB2 online jobs, while there was still power
>> in batteries.
>>
>> Roll forward to this decade, power off during gale season, generators
>> start, but one fails and goes offline, followed by other mayhem in the
>> power hardware. Back on batteries for 22 minutes, until they drain and
>> the z server crashes. Current operator says "what promise to shut
>> everything down cleanly before the batteries drain?".
>>
>> Is 22 minutes an unreasonable time figure for purging IMS messaqe
>> queues, bringing down CICS regions, draining initiators, and abending
>> and rolling back online IMS and DB2 jobs to the last checkpoint, swapping
>> logs, writing and dismounting log backups and turning off power before
>> sudden power loss starts to play mayhem with disk and other hardware?
>>
>> Oh did I mention, the 2 CPU single processor was only about 30% busy at
>> the
>> time, the Sunday weekly low CPU use period.
>>
>> We had a different sort of power outage after the first of the 2 crashes
>> last decade. Somebody working for one of the potential bidders used
>> a metal tape measure in an attempt to measure clearance around the
>> power cable entrance to the building. The resulting demonstration of
>> how much power moves through the space around a high voltage cable
>> destroyed several 3380 clone drives, in addition to crashing all
>> the OS/390 processors. I earned my DBA pay that day.
>>
>> Bottom line, what should happen when UPS batteries start to drain and
>> there is no prospect of reliable, high quality, utility power being
>> restored quickly? Leave it up and roll the dice about losing work
>> in progress and log data (head crashes and cache controller microcode
>> bugs) or shut it down cleanly?
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: GET IBM-MAIN INFO
> Search the archives at http://bama.ua.edu/archives/ibm-main.html
>


-- 
All the best,
Scott T. Harder

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to