Hi All, I have a couple of points to make on this topic:
1) For all the well documented (and well taken) information about the importance of shutting down a system orderly and cleanly, I find it hard to remember when - in my experience - the system ever had problems coming back up after a hard crash (I worked in OPS for 10+ yrs.). Maybe back in the 308x days, but 3090+ ?? The hardware was pretty resilient, as I remember. I'm not saying that I recommend anything other than a clean shutdown, but.... 2) Kelly's post harken's me back to an old pet peev and that is; Operations *used* to have good, knowledgable people who could make decisions without calling 5 people to tell them what to do! I saw, firsthand, the dumbing down of OPS and it disturbed me greatly. I had mgmt come into Operations where I worked that *never* wanted to be at fault... that was truly their #1 priority. They achieved this through never making a darn decision on their own... never sticking their neck out no matter what the situation. I remember one time where I restarted the master catalog to resolve a problem; as called for by the manual (ok... I think I could have gotten away with a lesser evil), but my point is that my mgmt thought I was nuts (and just lucky). Maybe so, but as long as we put zombies who won't take action based on knowledge and experience (and who are - most importantly - empowered to do so), then more money must be spent on hardware, systems and automation that will take the place of that. Just my thoughts... All the best, Scott T. Harder > Kelly Bert Manning wrote: >> Please don't laugh. >> >> I work with applications on a non-sysplex and non-xrf, supported, z/OS >> where there have been 3 cases of UPS batteries draining flat, >> followed by uncontrolled server crashes, in the past 17 years. >> >> They all happened in October and November, gale season (Cue background >> music with the "Gales of November" line by Gordon Lightfoot) >> >> After the first one the data center operator said that they would consider >> giving operators authority to shut down OS/390 if they were unable to >> make immediate contact with the "Duty Manager" after discovering that >> UPS batteries were draining during a power failure and that generator >> power was not available or failed after starting. >> >> Four weeks later a carbon copy crash occurred, inspriring a promise that >> operators would start draining CICS and IMS message queues and stopping >> and rolling back BMPs and DB2 online jobs, while there was still power >> in batteries. >> >> Roll forward to this decade, power off during gale season, generators >> start, but one fails and goes offline, followed by other mayhem in the >> power hardware. Back on batteries for 22 minutes, until they drain and >> the z server crashes. Current operator says "what promise to shut >> everything down cleanly before the batteries drain?". >> >> Is 22 minutes an unreasonable time figure for purging IMS messaqe >> queues, bringing down CICS regions, draining initiators, and abending >> and rolling back online IMS and DB2 jobs to the last checkpoint, swapping >> logs, writing and dismounting log backups and turning off power before >> sudden power loss starts to play mayhem with disk and other hardware? >> >> Oh did I mention, the 2 CPU single processor was only about 30% busy at >> the >> time, the Sunday weekly low CPU use period. >> >> We had a different sort of power outage after the first of the 2 crashes >> last decade. Somebody working for one of the potential bidders used >> a metal tape measure in an attempt to measure clearance around the >> power cable entrance to the building. The resulting demonstration of >> how much power moves through the space around a high voltage cable >> destroyed several 3380 clone drives, in addition to crashing all >> the OS/390 processors. I earned my DBA pay that day. >> >> Bottom line, what should happen when UPS batteries start to drain and >> there is no prospect of reliable, high quality, utility power being >> restored quickly? Leave it up and roll the dice about losing work >> in progress and log data (head crashes and cache controller microcode >> bugs) or shut it down cleanly? > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: GET IBM-MAIN INFO > Search the archives at http://bama.ua.edu/archives/ibm-main.html > -- All the best, Scott T. Harder ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

