> Saku Ytti
> Sent: Tuesday, November 17, 2020 6:55 AM
> 
> On Tue, 17 Nov 2020 at 03:40, Sabri Berisha <sa...@cluecentral.net> wrote:
> 
> Hey Sabri,
> 
> > Also, in the case that I described it wasn't a Junos device. Makes me
> > wonder how bugs like that get introduced. One would expect that after
> > 20+ years of writing BGP code, handling a withdrawl would be easy-peasy.
> 
> I don't think this is related to skill, that there was some hard programming
> problem that DE couldn't solve. These are honest mistakes.
> I've not experienced in my tenure the frequency of these bugs change at all,
> NOS are as common now as they were in the 90s.
> 
> I put most of the blame on the market, we've modelled commercial router
> market so that poor quality NOS is good for business and good quality NOS is
> bad for business, I don't think this is in anyone's formal business plan or 
> that
> companies even realise they are not even trying to make good NOS. I think it's
> emergent behaviour due to the market and people follow that market demand
> unknowingly.
> If we suddenly had one commercial NOS which is 100% bug free, many of their
> customers would stop buying support, would rely on spare HW and Internet
> forums for configuration help. Lot of us only need contracts to deal with 
> novel
> bugs all of us find on a regular basis, so good NOS would immediately reduce
> revenue. For some reason Windows, macOS or Linux almost never have novel
> bugs that the end user finds and when those are found, it's big news. While we
> don't go a month without hitting a novel bug in one of our NOS, and no one
> cares about it, it's business as usual.
> 
> I also put a lot of blame on C, it was a terrific language when compiling had 
> to
> be fast. Basically macro assembler. Now the utility of being 'close to HW' is
> gone, as the CPU does so much C compiler has no control over, it's not really
> even executing the same code as-written anymore. MSFT estimated >70% of
> their bugs are related to memory safety. We could accomplish significant
> improvements in software quality if we'd ditch C and allow the computer to do
> more formal correctness checks at compile time and design languages which
> lend towards this.
> 
> 
> We constantly misattribute problems (like in this post) to config or HW, while
> most common reasons for outages are pilot error and SW defect, and very little
> engineering time is spent on those. And often the time spent improving the two
> first increases the risk of the two latter, reducing mean availability over 
> time.
> 
I agree with everything but the last statement. 
>From my experience, most of the SPs spend a considerable time testing for SW 
>defects on features (and combinations of features) that will be used and at 
>scale intended, that's how you identify most of the bugs. What you're left 
>with afterwards are special packets of death or some slow memory leaks 
>(basically the more exotic stuff).
 
adam
 

Reply via email to