On Thursday, May 4, we experienced a line of moderate to sever
thunderstorms move through our area (southeast Texas). We lost
commercial power. Nothing unusual about that.  Happens all the time.
Normally, the UPS carries us until the generator is up and stable. On
any power glitch, we commit to generator and stay there for about an
hour past the last hit. 

 

We do a full test every Friday. That is, we simulate dropping commercial
power and actually run off the generator for an hour or so. So far so
good. 

 

But this time two things went wrong. One, the generator did not start
(the batteries failed). Two, commercial power did not return for over
four hours. As it tuned out, the UPS batteries carried us for about 30
minutes but we had no idea what to expect at the time. All we had left
were the building emergency lights on their own separate batteries. 

 

A while back, I asked this august group's opinion as what I should do in
just such an event. The consensus was to let the box run and hope for
either the generator or commercial power. If neither happens, both the
hardware and software know what to do, and, left alone, will recover
nicely once power is back. Chances are there would be no outage.     

 

I made that recommendation to management and they agreed. I did power
off all unnecessary equipment. I began to wonder if I should go ahead
and attempt to shut down when all power was lost. Three long hours pass.


 

We got the generator started about 10 minutes before commercial power
came back. The UPS, however, did not survive. Since more storms were in
the area, management elected to override the automatic controls and
remain on the generator.  

 

Anyway, I executed my power up script. Two things a little unexpected
happened. One, the 2086-0A4's HMC ran through a CKDISK for over an hour.
I used a Support Element to do the POR, initial activates, and
eventually the IPL's. 

 

The other was a blinking green power light on the 2105-800 Shark and a
status code 03 on the left controller. The Shark would not go ready. It
has now been about an hour since power was available, about four hours
into the event and panic is starting to color my thinking. We put IBM at
SEV 1 and opened our DR plan.  As we began the first DR steps, the
blinking stopped, IBM arrived, and the Shark went ready (all within a
few seconds).  

 

We IPL'ed (using the support element) and the major subsystems (DB2,
JES, MQ) hit the ground running. It was sweet. We had one little
application startup sequence issue, but no deviation from the procedure
was needed. We very smoothly went from the dark to doing business.  

 

Meanwhile, the IBM CE researched the status code and blinking power
light.  He reported that the status was that the Shark was recharging
its batteries, and would not go ready until here was enough capacity to
get through another total power loss. How cool, we though. Exactly what
we would want. He went on to say that the recharge could take up to 25
hours. 

 

Gulp. Did he say 25 hours? Yup. 25 hours. Hmmmm.   

 

The bottom line (and moral) is that I would be willing to do recommend
exactly the same thing again. I now know that it may take well over an
hour after power is restored and that's ok. And, what I am waiting for
is well worth the wait. 

 

But there are two new players in the game. One, there is a shiny new
DS8100 in both the primary and DR sites waiting for power whips and two,
a far more aggressive DR strategy in the pipeline. 

 

Sorry about the word count. Hope this adds value to your operation. You
folks have added much to mine.    

 

*Please* don't forget to trim this before replying or commenting. 

 

 Hal. 

 


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to