No one had hit the ISIS bug before the IETF enforced maintenance freeze because no one in their right mind would be running three week old code back then. I don't think things have changed that much. ;)
-dorian On Feb 7, 2013, at 4:19 PM, Siegel, David wrote: > I remember being glued to my workstation for 10 straight hours due to an OSPF > bug that took down the whole of net99's network. > > I was pretty proud of our size at the time...about 30Mbps at peak. Times are > different and so are expectations. :-) > > Dave > > > -----Original Message----- > From: Brett Watson [mailto:[email protected]] > Sent: Wednesday, February 06, 2013 6:07 PM > To: [email protected] > Subject: Re: Level3 worldwide emergency upgrade? > > Hell, we used to not have to bother notifying customers of anything, we just > fixed the problem. Reminds me a of a story I've probably shared on the past. > > 1995, IETF in Dallas. The "big ISP" I worked for at the time got tripped up > on a 24-day IS-IS timer bug (maybe all of them at the time did, I don't > recall) where all adjacencies reset at once. That's like, entire network > down. Working with our engineering team in the *terminal* lab mind you, and > Ravi Chandra (then at Cisco) we reloaded the entire network of routers with > new code from Cisco once they'd fixed the bug. I seem to remember this being > my first exposure to Tony Li's infamous line, "... Confidence Level: boots in > the lab." > > Good times. > > -b > > > On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote: > >> David. I am on an evening shift and am just now reading this thread. >> >> I was almost tempted to write an explanation that would have had >> identical content with yours based simply on Level3 doing something >> and keeping the information close. >> >> Responsible Vendors do not try to hide what is being done unless it is >> an Op Sec issue and I have never seen Level3 act with less than >> responsibility so it had to be Op Sec. >> >> When it is that, it is best if the remainder of us sit quietly on the >> sidelines. >> >> Ralph Brandt >> >> >> -----Original Message----- >> From: Siegel, David [mailto:[email protected]] >> Sent: Wednesday, February 06, 2013 12:01 PM >> To: 'Ray Wong'; [email protected] >> Subject: RE: Level3 worldwide emergency upgrade? >> >> Hi Ray, >> >> This topic reminds me of yesterday's discussion in the conference >> around getting some BCOP's drafted. it would be useful to confirm my >> own view of the BCOP around communicating security issues. My >> understanding for the best practice is to limit knowledge distribution >> of security related problems both before and after the patches are >> deployed. You limit knowledge before the patch is deployed to prevent >> yourself from being exploited, but you also limit knowledge afterwards >> in order to limit potential damage to others (customers, >> competitors...the Internet at large). You also do not want to >> announce that you will be deploying a security patch until you have a >> fix in hand and know when you will deploy it (typically, next >> available maintenance window unless the cat is out of the bag and danger is >> real and imminent). >> >> As a service provider, you should stay on top of security alerts from >> your vendors so that you can make your own decision about what action >> is required. I would not recommend relying on service provider >> maintenance bulletins or public operations mailing lists for obtaining >> this type of information. There is some information that can cause >> more harm than good if it is distributed in the wrong way and >> information relating to security vulnerabilities definitely falls into that >> category. >> >> Dave >> >> -----Original Message----- >> From: Ray Wong [mailto:[email protected]] >> Sent: Wednesday, February 06, 2013 9:16 AM >> To: [email protected] >> Subject: Re: Level3 worldwide emergency upgrade? >> >>> >> >> OK, having had that first cup of coffee, I can say perhaps the main >> reason I was wondering is I've gotten used to Level3 always being on >> top of things (and admittedly, rarely communicating). They've reached >> the top by often being a black box of reliability, so it's (perhaps >> unrealistically) surprising to see them caught by surprise. Anything >> that pushes them into scramble mode causes me to lose a little sleep >> anyway. The alternative to what they did seems likely for at least a >> few providers who'll NOT manage to fix things in time, so I may well >> be looking at longer outages from other providers, and need to issue >> guidance to others on what to do if/when other links go down for >> periods long enough that all the cost-bounding monitoring alarms start >> to scream even louder. >> >> I was also grumpy at myself for having not noticed advance >> communication, which I still don't seem to have, though since I >> outsourced my email to bigG, I've noticed I'm more likely to miss >> things. Perhaps giving up maintaining that massive set of procmail >> rules has cost me a bit more edge. >> >> Related, of course, just because you design/run your network to >> tolerate some issues doesn't mean you can also budget to be in support >> contract as well. :) Knowing more about the exploit/fix might mean >> trying to find a way to get free upgrades to some kit to prevent more >> localized attacks to other types of gear, as well, though in this case >> it's all about Juniper PR839412 then, so vendor specific, it seems? >> >> There are probably more reasons to wish for more info, too. There's >> still more of them (exploiters/attackers) than there are those of us >> trying to keep things running smoothly and transparently, so anything >> that smells of "OMG new exploit found!" also triggers my desire to >> share information. The network bad guys share information far more >> quickly and effectively than we do, it often seems. >> >> -R> >> >> >> > > >

