[NTSysADM] Off-topic (Was RE: AWS East Outage)

Frank Ress Thu, 02 Mar 2017 15:38:19 -0800

Webster,

Mention of Gallipolis made me wonder.  Just a fun story below – read at your 
own risk.  I.e. if you don’t want to waste the time, delete this now.


FWIW, my sister married a guy from Gallipolis.

His grandfather was one of those guys who was buddies with everyone in town.  
He started an auto parts business that he built up, became pretty successful, 
and his son followed him into the business.

Before U.A. (the grandfather – everyone just knew him as UA) started his 
business, one of his buddies decided to go into the sausage business and asked 
UA to join him, but UA decided not to.  The buddy was Bob Evans.

My sis might have married into a sausage dynasty – so close.

I know discretion is always a good practice, so I won’t ask.  But Gallipolis 
isn’t really the hub of industry, and it wouldn’t surprise me to find out that 
being successful with sausage doesn’t keep you from making mistakes in IT.

Been lurking on the list for a long time now.  Been a while since I posted 
anything - maybe back to Winnt-L, can’t really remember for sure.   But I’m 
counting down toward retirement (another year or so).  Too easy to miss 
opportunities to say thanks, and I thought this would be as good a time as any 
to say that I’ve appreciated the advice, expertise, and conversation.

Frank

From: [email protected] [mailto:[email protected]] On 
Behalf Of Webster
Sent: Thursday, March 02, 2017 2:56 PM
To: [email protected]
Subject: RE: [EXTERNAL]Re: [NTSysADM] AWS East Outage

MBS,

Do you remember the project I was doing in Gallipolis, OH back in January 2010? 
It was like 10 degrees with a wind chill of 10 below. Some female went into the 
datacenter and decided it was too cold in there so she yanked the cover off the 
A/C system and turned it off. About 30 minutes later, alarms went off when the 
inside temp hit 85 degrees. Soon after, SANS, NAS units, servers and UPS 
systems started powering off. By the time we got there, the inside temp had 
reach over 100% (not good for a full datacenter). The only things still powered 
on were a few UPS systems. Every server and storage system had hard shutdown.

The problems?

No servers were labeled
No one knew what servers should be powered up in which order
There were Unix systems no one knew what they did or what the logon creds were 
or even if they logged on, what to do
SANs were reporting failed drives, there were no replacement drives and the 
SANs were out of warranty
We found a fully populated HP SAN with fibre drives that had all the cables 
dangling in the back and not connected to a single server or device!!! (some HP 
sales rep made some money on that deal)
The HVAC unit was now frozen and could not be powered on so the HVAC team went 
on the roof and removed a portion of the roof to get cold are in the datacenter

That was a fun day. Customer wanted me for a 6-week project to help fix 
everything and I was terminated by my employer the next day due to lack of work 
to keep me busy?????

Webster

From: [email protected] [mailto:[email protected]] On 
Behalf Of Michael B. Smith
Sent: Thursday, March 2, 2017 2:36 PM
To: [email protected]
Subject: RE: [EXTERNAL]Re: [NTSysADM] AWS East Outage

OMG.

“we have not completely restarted the index subsystem or the placement 
subsystem in our larger regions for many years.”

That sentence scares me. But perhaps it shouldn’t.

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Kennedy, Jim
Sent: Thursday, March 2, 2017 3:12 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: [EXTERNAL]Re: [NTSysADM] AWS East Outage

So the facts are out. Short version, basically someone fat fingered a command 
and deleted a bunch of really important servers.


https://aws.amazon.com/message/41926/


From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Melvin Backus
Sent: Thursday, March 2, 2017 9:47 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: [EXTERNAL]Re: [NTSysADM] AWS East Outage

That’s probably what caused the problem to being with. All that conversion and 
somebody missed a decimal point.

--
There are 10 kinds of people in the world...
         those who understand binary and those who don't.

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of David McSpadden
Sent: Thursday, March 2, 2017 7:17 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: [EXTERNAL]Re: [NTSysADM] AWS East Outage

I believe it was an US-Converted-Metric S-ton IMHO.


From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Richard Stovall
Sent: Thursday, March 2, 2017 7:05 AM
To: [email protected]<mailto:[email protected]>
Subject: [EXTERNAL]Re: [NTSysADM] AWS East Outage

Is that a metric S-ton, or the other kind?

The is a difference.

On Mar 2, 2017 2:38 AM, "Don Ely" <[email protected]<mailto:[email protected]>> 
wrote:
It is pretty trivial if you're setup correctly, but the setup takes an S-Ton of 
work and testing...

On Wed, Mar 1, 2017 at 3:30 PM Michael B. Smith 
<[email protected]<mailto:[email protected]>> wrote:
I have to say, what surprised me most about this outage was the lack of 
failover to alternate datacenters for some pretty big names.

I have no idea how this works in AWS, but in Azure it’s fairly trivial; I would 
expect the same of AWS.

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>] 
On Behalf Of Andrew S. Baker
Sent: Wednesday, March 1, 2017 12:22 PM

To: [email protected]<mailto:[email protected]>
Subject: Re: [NTSysADM] AWS East Outage

If not S3, then what?

You're always going to be relying on someone else's something.

Some data center provider (okay, so you might run your own)
Some power provider
Some Internet provider

It's not like they have internet outages every week, and it's not like various 
organizations relying upon them haven't had outages for their own reasons.

Technology breaks, which is why we RAID, cluster, backup, failover and farm our 
systems, devices and data centers.


Regards,



 ASB
 http://XeeMe.com/AndrewBaker<http://xeeme.com/AndrewBaker>

 Providing Expert Technology Consulting Services for the SMB market…

 GPG: 860D 40A1 4DA5 3AE1 B052 8F9F 07A1 F9D6 A549 8842



Sent with 
Mixmax<https://mixmax.com/s/WMB47Rd39yDNPFfWo?utm_source=mixmax&utm_medium=email&utm_campaign=signature_link&utm_content=sent_with_mixmax>






On Wed, Mar 1, 2017 8:37 AM, J- P 
[email protected]<mailto:[email protected]> wrote:

https://techcrunch.com/2017/03/01/the-day-amazon-s3-storage-stood-still/

Would / should you hold your IT vendor responsible for relying on S3?





Jean-Paul Natola


________________________________
From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> on 
behalf of Andrew S. Baker <[email protected]<mailto:[email protected]>>
Sent: Tuesday, February 28, 2017 5:36 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: [NTSysADM] AWS East Outage

Indeed.


Regards,


 ASB
 GPG: 860D 40A1 4DA5 3AE1 B052 8F9F 07A1 F9D6 A549 8842



Sent with 
Mixmax<https://mixmax.com/s/WMB47Rd39yDNPFfWo?utm_source=mixmax&utm_medium=email&utm_campaign=signature_link&utm_content=sent_with_mixmax>






On Tue, Feb 28, 2017 3:56 PM, David McSpadden 
[email protected]<mailto:[email protected]> wrote:
So the normal question 'is the Internet down?' Is valid today?

Sent from my iPhone

On Feb 28, 2017, at 3:44 PM, Andrew S. Baker 
<[email protected]<mailto:[email protected]>> wrote:
Notice:  This email is from an outside source.  Please do not open any 
attachments, click on any hyperlinks, or respond without first confirming the 
authenticity of the email.
Indeed.

It's like someone broke the whole Internet.   Or, at least, 80% of it.


Regards,



 ASB
 http://XeeMe.com/AndrewBaker<http://xeeme.com/AndrewBaker>

 Providing Expert Technology Consulting Services for the SMB market…

 GPG: 860D 40A1 4DA5 3AE1 B052 8F9F 07A1 F9D6 A549 8842



Sent with 
Mixmax<https://mixmax.com/s/WMB47Rd39yDNPFfWo?utm_source=mixmax&utm_medium=email&utm_campaign=signature_link&utm_content=sent_with_mixmax>





On Tue, Feb 28, 2017 2:13 PM, Kennedy, Jim 
[email protected]<mailto:[email protected]> wrote:

Learning very quickly how many vendors we have that are using AWS.  Lots is the 
first word that comes to mind.



From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Charles F Sullivan
Sent: Tuesday, February 28, 2017 1:57 PM
To: [email protected]<mailto:[email protected]>
Subject: [NTSysADM] AWS East Outage



Any of your organizations being affected by this? The few services we have 
moved there so far are down.

http://bgr.com/2017/02/28/internet-outage-amazon-web-services/





Charlie Sullivan

Sr. Windows Systems Administrator

Boston College

197 Foster St. Room 367

Brighton, MA 02135



This e-mail and any files transmitted with it are property of Indiana Members 
Credit Union, are confidential, and are intended solely for the use of the 
individual or entity to whom this e-mail is addressed. If you are not one of 
the named recipient(s) or otherwise have reason to believe that you have 
received this message in error, please notify the sender and delete this 
message immediately from your computer. Any other use, retention, 
dissemination, forwarding, printing, or copying of this email is strictly 
prohibited.


Please consider the environment before printing this email.


This e-mail and any files transmitted with it are property of Indiana Members 
Credit Union, are confidential, and are intended solely for the use of the 
individual or entity to whom this e-mail is addressed. If you are not one of 
the named recipient(s) or otherwise have reason to believe that you have 
received this message in error, please notify the sender and delete this 
message immediately from your computer. Any other use, retention, 
dissemination, forwarding, printing, or copying of this email is strictly 
prohibited.


Please consider the environment before printing this email.

________________________________

This communication is for the use of the intended recipient only. It may 
contain information that is privileged and confidential. If you are not the 
intended recipient of this communication, the disclosure, copying, distribution 
or use hereof is prohibited. If you have received this communication in error, 
please advise me by return e-mail or by telephone and then delete it 
immediately.

[NTSysADM] Off-topic (Was RE: AWS East Outage)

Reply via email to