On 6/6/23 09:27, Saku Ytti wrote:
I am not implying it is pragmatic or possible, just correct from a design point of view. Commercial software deals with competing requirements, and these requirements are not constructive towards producing maintainable clean code. Over time commercial software becomes illiquid with its technical debt. There is no real personal reward for paying technical debt, because almost invariably it takes a lot of time, brings no new revenue and non-coder observing your work only sees the outages the debt repayment caused. While another person who creates this debt creating new invoiceable features and bug fixes in ra[pb]id manner is a star to the non-coder observers. Not to say our open source networking is always great either, Linux developers are notorious about not asking SMEs 'how has this problem been solved in other software'. There are plenty of anecdotes to choose from, but I'll give one. - In 3.6 kernel, FIB was introduced to replace flow-cache, of course anyone dealing with networking could have told kernel developers day1 why flow-cache was a poor idea, and what FIB is, how it is done, and why it is a better idea. - In 3.6 FIB implementation, ECMP was solved by essentially randomly choosing 1 option of many, per-packet. Again they could have asked even junior network engineers 'how does ECMP work, how should it be done, I'm thinking of doing like this, why do you think they've not done this in other software?' But they didn't. - in 4.4 Random ECMP was changed to do hashed ECMP I still continue to catch discussions about poor TCP performance on Linux ECMP environment, then I first ask what kernel do you have, then I explain to them why per-packet + cubic will never ever perform. So for 4 years ECMP was completely broke, and reading ECMP release notes in 4.4 not even developers had completely understood just how bad the problem one, so we can safely assume people were not running ECMP. Another example was when I tried to explain to the OpenSSH mailing list, that ''TOS' isn't a thing, and got a confident reply that TOS absolutely is a thing, prec/DSCP are not. Luckily a few years later Job fixed OpenSSH packet classification. But these examples are everywhere, so it seems you either choose software written by people who understand the problem but are forced to write unmaintainable code, or you choose software by people who are just now learning about the problem and then solve it without discovering prior art, usually wrong.
I think being able to write code is one thing. Being able to write code to build and run an IP/MPLS network is - not a-whole-other - but another thing. I say this because people that know how to write code do not always understand how IP/MPLS networks work. And for better or worse, we need code to run the routers and switches that deliver IP/MPLS capability to network operators.
The reason traditional networking OEM's build usable code that allows us to run IP/MPLS networks is that their raison d'ĂȘtre is, well, shifting packets around the world as quickly as possible. General-purpose OS developers optimize for service/app performance, leaving the problem of network performance to the networking folk, for the most part. So it does not surprise me that developers who code for a general-purpose OS would think RIP is better than IS-IS, for example, just because it has the word "Routing" in it and they can write code for it. It's not because they don't know how to write code for IS-IS... they just don't have the organizational structure setup to care about why IS-IS is a better idea than RIP. Their organization setup is app, app, app.
Unfortunately, not everybody can be a Cisco, Juniper, Google or AWS, who have the benefit of plenty of people that can more easily integrate writing code for its down sake with writing code for networking.
It is the reason most large scale network operators will still continue to find value in IOS XR, Junos, EOS, ArcOS, e.t.c., than, say, a NOS that was put together by someone that knows how to interpret an RFC and spit out an implementation on Linux, with zero understanding of the overall TCP/UDP/IP/MPLS/Ethernet stack and how it all ties in together at scale.
I like what folk like pfSense (Netgate) are doing with FRR, and also what folk like Mikrotik can pack in 13MB of software... but at a certain scale, you simply can't ignore traditional networking OEM, try as we might.
Mark. _______________________________________________ juniper-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/juniper-nsp

