Many data centers don't use SPF, instead they use RFC 7938. On Tue, Aug 19, 2025 at 11:34 PM Saku Ytti via NANOG <[email protected]> wrote:
> On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG > <[email protected]> wrote: > > > I don't know of many networks that choose link costs to ensure resulting > > uniqueness of the cumulative cost through the path. Indeed, ECMP is > taken > > to be an assumption for most IGPs we use in the real world. > > That is funny, and of course we can beat Djikstra massively if we can > make assumptions for specific environments, which is arguably what > engineering is, take advantage of environment constants that allow for > assumptions which yield to optimisation. > > How is SPF ran today? I have no clue, because the modern approach to > convergence is not to converge fast, but to converge before fault. > Which is not something Djikstra does. The naive approach would be to > just run SPF many many times, removing from the topology failed nodes > and edges to recover post-converge topology and loop free alternative > paths. > But absolutely there exists some domain specific solution which is > cheaper when you need to recover both the best current path and best > post-convergence paths. If such an algorithm is actually used or if > the much more antifragile approach is used to throw a compute at it > and run SPF as many times as it takes, I have no idea. > > In Junos a few years back they enabled out-of-box the infrastructure > for this post-fault convergence, regardless if or not you chose to > install the backup paths. > How this is implemented in practice is that the same structure that > ECMP uses is used for backup paths, just the backup path is programmed > in the hardware at worse weight, so it becomes excluded as ECMP option > during lookup result. However because the infrastructure is still > enabled, if for example interface flaps, the HW will invalidate the > best ECMP option, and the next-best (if any) becomes valid. > > In practice what happened after Juniper enabled that infrastructure is > that we started to get a lot of bugs where after the network event we > had a blackholing event. These were largely caused because software > omits reprogramming hardware when something happens sufficiently fast > that software didn't have time to invalidate the best option, then > software will prune the invalid+valid before it enters hardware. Which > is good optimisation, unless you've now added capability in the > hardware to invalidate adjance without sw. > To our surprise, Junos code has suffered so much technical debt that > Juniper doesn't actually know every place in code where this could > happen. We raised a separate issue to figure out why so many similar > bugs occurred to us, and Juniper came out with an answer which is > paraphrased as 'we just have to find all the bugs where this can > happen''. Naively you'd want that all these go through one function > call, and you fix the bug once there, but apparently the codebase is > far less clean so they cannot deterministically say if all of those > cases are fixed or not. > This used to be, in my experience, super rare in Junos that HW/SW > disagree with, while it used to be extremely common in PFC3. > We've not not seen this type of bug in a year or two, so maybe most are > fixed. > > But certainly if you are running MPLS you can have 100% coverage for > all faults, if post-convergence path exists, you can utilise it > immediately after hardware detects fault (link down), without waiting > for software. This makes SPT performance quite uninteresting, if rapid > convergence is the goal. > > > > -- > ++ytti > _______________________________________________ > NANOG mailing list > > https://lists.nanog.org/archives/list/[email protected]/message/D4TMSWXOGQHKL7ZQEQZ2HKABGKKYW2AB/ > _______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/[email protected]/message/VJUMJ34RQPNXRGVJFXLTRWXXP4IGSKEW/
