A month or so ago, I saw an article in the Boston Glob about horrible
things happening to to the network over at Beth Israel/Deaconess.  The
hospital had no email or other electronic services for the better part
of three days, knocking the hospital back to the 1970's the hard way.
The Glob article was unusual in that it had a major organization
'fessing up about their mistakes, and also in the amount of technical
details included.  Apparently the main culprit in the collapse (aside
from cumulative mistakes and such) was spanning tree algorithms.  I
thought this was very interested and mailed to the Glob author asking
if there was more public information about what happened.

This is one of the original articles:

http://www.boston.com/dailyglobe2/323/metro/Hospital_computer_crash_a_lesson_to_the_industry+.shtml

This is not the article that mentioned spanning trees.  That one was a
week or so later in the Health/Science section.

Today the Globe author send out a mass mailing to the many of us who
asked for more information.  She referenced this page at BI:

http://home.caregroup.org/templatesnew/departments/BID/network_outage

Here's some of the key information:

"When Cisco TAC was first able to access and assess the network, they
found the Layer 2 structure of the network to be unstable and out of
specification with 802.1d standards. The management vlan (vlan 1) had
in some locations 10 Layer2 hops from root.

"The conservative default values for the Spanning Tree Protocol (STP)
impose a maximum network diameter of seven. This means that two
distinct bridges in the network should not be more than seven hops
away from one to the other."

There are other related pages on this site that detail different
aspects of the outage.  

We should be grateful to BI for making this information public so we
have examples of how things shouldn't be done and what it can cost to
make this kind of mistake.  This is also a good illustration for
something that network geeks have been telling me, but that I didn't
understand at a gut level, namely the importance of keeping routing
and similar messes on layer 3, not layer 2.

Have fun,

Lauren

P.S.  Still looking for work.  http://www.linnaean.org/~lpb/r.html


---
Send mail for the `bblisa' mailing list to `[EMAIL PROTECTED]'.
Mail administrative requests to `[EMAIL PROTECTED]'.

Reply via email to