Many data centers don't use SPF, instead they use RFC 7938.

On Tue, Aug 19, 2025 at 11:34 PM Saku Ytti via NANOG <[email protected]>
wrote:

>  On Mon, 18 Aug 2025 at 21:22, Matthew Petach via NANOG
> <[email protected]> wrote:
>
> > I don't know of many networks that choose link costs to ensure resulting
> > uniqueness of the cumulative cost through the path.  Indeed, ECMP is
> taken
> > to be an assumption for most IGPs we use in the real world.
>
> That is funny, and of course we can beat Djikstra massively if we can
> make assumptions for specific environments, which is arguably what
> engineering is, take advantage of environment constants that allow for
> assumptions which yield to optimisation.
>
> How is SPF ran today? I have no clue, because the modern approach to
> convergence is not to converge fast, but to converge before fault.
> Which is not something Djikstra does. The naive approach would be to
> just run SPF many many times, removing from the topology failed nodes
> and edges to recover post-converge topology and loop free alternative
> paths.
> But absolutely there exists some domain specific solution which is
> cheaper when you need to recover both the best current path and best
> post-convergence paths. If such an algorithm is actually used or if
> the much more antifragile approach is used to throw a compute at it
> and run SPF as many times as it takes, I have no idea.
>
> In Junos a few years back they enabled out-of-box the infrastructure
> for this post-fault convergence, regardless if or not you chose to
> install the backup paths.
> How this is implemented in practice is that the same structure that
> ECMP uses is used for backup paths, just the backup path is programmed
> in the hardware at worse weight, so it becomes excluded as ECMP option
> during lookup result. However because the infrastructure is still
> enabled, if for example interface flaps, the HW will invalidate the
> best ECMP option, and the next-best (if any) becomes valid.
>
> In practice what happened after Juniper enabled that infrastructure is
> that we started to get a lot of bugs where after the network event we
> had a blackholing event. These were largely caused because software
> omits reprogramming hardware when something happens sufficiently fast
> that  software didn't have time to invalidate the best option, then
> software will prune the invalid+valid before it enters hardware. Which
> is good optimisation, unless you've now added capability in the
> hardware to invalidate adjance without sw.
> To our surprise, Junos code has suffered so much technical debt that
> Juniper doesn't actually know every place in code where this could
> happen. We raised a separate issue to figure out why so many similar
> bugs occurred to us, and Juniper came out with an answer which is
> paraphrased as 'we just have to find all the bugs where this can
> happen''. Naively you'd want that all these go through one function
> call, and you fix the bug once there, but apparently the codebase is
> far less clean so they cannot deterministically say if all of those
> cases are fixed or not.
> This used to be, in my experience, super rare in Junos that HW/SW
> disagree with, while it used to be extremely common in PFC3.
> We've not not seen this type of bug in a year or two, so maybe most are
> fixed.
>
> But certainly if you are running MPLS you can have 100% coverage for
> all faults, if post-convergence path exists, you can utilise it
> immediately after hardware detects fault (link down), without waiting
> for software. This makes SPT performance quite uninteresting, if rapid
> convergence is the goal.
>
>
>
> --
>   ++ytti
> _______________________________________________
> NANOG mailing list
>
> https://lists.nanog.org/archives/list/[email protected]/message/D4TMSWXOGQHKL7ZQEQZ2HKABGKKYW2AB/
>
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/[email protected]/message/VJUMJ34RQPNXRGVJFXLTRWXXP4IGSKEW/

Reply via email to