Great story! I replaced a failed drive in an Exchange 5.5 server many years ago. No problem right? Everything is hot swappable. We've replaced many drives with no problems.
Well, some unusual problem with the mobo and the controller caused the server to go down hard and upon bootup, the RAID configuration could not be identified. Many, many hours later (this was during production hours of course) we got the server backup with the original failed drive back in place. We subsequently scheduled an outage, replaced the mobo, and everything was just fine. I had a little sit down with my boss, got a slight ribbing, nothing major. The only result of that whole scenario is that we now only replace drives outside of production hours. A one-in-a-million incident changed the way we do business....and in the end, probably for the better. - Sean On Fri, Aug 27, 2010 at 2:23 PM, Steven Peck <[email protected]> wrote: > So many times the screw ups are small. Noticeable only to a few. However, > there are those times, when you become 'that guy'. Those days and the weeks > that follow just sort of suck. > > At a previous place, way back in the NT3.51 days. A DC had a hardware > failure that was resolved but left the DC in a weird state. So the BDC was > promoted and that blue screened and failed as well. The backup tapes were > turned to and well... it really was a bad week for those two guys. > > So people were told no logging out, work on paper, pull out the disaster > scenario plans and keep the business running. Three days and a bunch of > desktop guys running around fixing broken trusts on desktop systems later > (site had 8,000 desktops and a few thousand servers, only about 1,000 > desktops had to be physically touched) the disaster was over. > > A common statement was "I'll bet those guys got fired". The IT director > stopped by our group once and stated, "I already paid for the mistake, why > would I get rid of those who have learned the lesson?". Of course, they > were staked to the wall for a few weeks in various meetings until some other > issue allowed them to take off the burlap sacks. > > This was not a small company. It was a large International place whose > products every person on this lists uses. > > Sometimes, things just happen. All it takes is for the wrong 3 or 4 things > to happen at the same time to make it worse. > > Steven > > On Fri, Aug 27, 2010 at 2:38 PM, Sean Martin <[email protected]>wrote: > >> I'm not overly concerned. We've been running a couple of CX700s for >> about 5 years with no issues other than a few bad drives. I'm sure the >> vendor of their hardware had a similar track record though. I'd still like >> to know why bad memory would prompt them to failover. >> >> - Sean >> >> On Fri, Aug 27, 2010 at 12:04 PM, Mayo, Bill <[email protected]>wrote: >> >>> I wouldn't worry. We have been using EMC CX-whatevers for over 7 years >>> now. We have had hard drives die (naturally) and even once had a memory >>> stick go bad. But never once has a hardware failure lead to any downtime. >>> The redundancy within CLARiiONs is extensive. >>> >>> Bill Mayo >>> >>> ------------------------------ >>> *From:* Sean Martin [mailto:[email protected]] >>> *Sent:* Friday, August 27, 2010 3:51 PM >>> >>> *To:* NT System Admin Issues >>> *Subject:* Re: Massive computer outage halts some Va. agencies >>> >>> Odd, I can hit the site just fine from Anchorage.... >>> >>> Possibly. Can't wait to find out. We just bought an EMC CX4-960 earlier >>> this year. I'm assuming they're using higher-end gear so it has me thinking >>> what kind of failures we could be victim of. >>> >>> - Sean >>> >>> On Fri, Aug 27, 2010 at 11:44 AM, Ben Scott <[email protected]>wrote: >>> >>>> On Fri, Aug 27, 2010 at 2:17 PM, Sean Martin <[email protected]> >>>> wrote: >>>> > >>>> http://hamptonroads.com.nyud.net/2010/08/massive-computer-outage-halts-some-va-agencies >>>> >>>> Unable to lookup the IP address of host name: hamptonroads.com.nyud.net >>>> The DNS (Domain Name System) subsystem returned: Server Failure: The >>>> name server was unable to process this query. >>>> >>>> > I'm really curious to know what SAN hardware they're using.... >>>> >>>> Maybe they're using the same SAN system that ate Microsoft's >>>> Sidekick database last year... >>>> >>>> -- Ben >>>> >>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >>>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >>>> >>>> --- >>>> You are currently subscribed to ntsysadmin as: [email protected]. >>>> To unsubscribe click here: >>>> http://lyris.sunbelt-software.com/u?id=7432680.6e6595761ad36f434042683031b94018&n=T&l=ntsysadmin&o=9077006 >>>> or send a blank email to >>>> leave-9077006-7432680.6e6595761ad36f434042683031b94...@lyris.sunbelt-software.com >>>> >>> >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >>> >>> >>> --- >>> You are currently subscribed to ntsysadmin as: [email protected]. >>> To unsubscribe click here: >>> http://lyris.sunbelt-software.com/u?id=787424.5b52d7aa670c9e7284b212f5caeb31d2&n=T&l=ntsysadmin&o=9077014 >>> >>> (It may be necessary to cut and paste the above URL if the line is >>> broken) >>> or send a blank email to >>> leave-9077014-787424.5b52d7aa670c9e7284b212f5caeb3...@lyris.sunbelt-software.com >>> >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >>> >>> >>> --- >>> You are currently subscribed to ntsysadmin as: [email protected]. >>> To unsubscribe click here: >>> http://lyris.sunbelt-software.com/u?id=7432680.6e6595761ad36f434042683031b94018&n=T&l=ntsysadmin&o=9077026 >>> >>> (It may be necessary to cut and paste the above URL if the line is >>> broken) >>> or send a blank email to >>> leave-9077026-7432680.6e6595761ad36f434042683031b94...@lyris.sunbelt-software.com >>> >>> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >> >> >> --- >> You are currently subscribed to ntsysadmin as: [email protected]. >> To unsubscribe click here: >> http://lyris.sunbelt-software.com/u?id=7435566.4f879397846fbfec9cad8950daf326b3&n=T&l=ntsysadmin&o=9077113 >> >> (It may be necessary to cut and paste the above URL if the line is broken) >> or send a blank email to >> leave-9077113-7435566.4f879397846fbfec9cad8950daf32...@lyris.sunbelt-software.com >> >> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > > --- > You are currently subscribed to ntsysadmin as: [email protected]. > To unsubscribe click here: > http://lyris.sunbelt-software.com/u?id=7432680.6e6595761ad36f434042683031b94018&n=T&l=ntsysadmin&o=9077137 > > (It may be necessary to cut and paste the above URL if the line is broken) > or send a blank email to > leave-9077137-7432680.6e6595761ad36f434042683031b94...@lyris.sunbelt-software.com > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- You are currently subscribed to ntsysadmin as: [email protected]. To unsubscribe click here: http://lyris.sunbelt-software.com/u?id=8142875.a9cf90b99baa17cb4fcf8293a59eb3b1&n=T&l=ntsysadmin&o=9077152 or send a blank email to leave-9077152-8142875.a9cf90b99baa17cb4fcf8293a59eb...@lyris.sunbelt-software.com
