> Rewrites on MPLS is horrible from a memory perspective as maintaining the 
> state and label transition to explore all possible discrete paths across the 
> overall end-to-end path you are trying to take is hugely in-efficient. 
> Applying circuit switching to a packet network was bad from the start. SR 
> doesn't resolve that, as you are stuck with a global label problem and the 
> associated lack of being able to engineer your paths, or a label stack 
> problem on ingress that means you need a massive ASIC's and memories there.
> 
> I don't think rewrites are horrible, but just very flexible and this *can* 
> come up with a certain price. Irt to your memory argument that path 
> engineering takes in vanilla TE a lot of forwarding slots we should remind us 
> that this is not a design principle of MPLS. Discrete paths could also be 
> signalled in MPLS with shared link-labels so that you will end up with the 
> same big instructional headend packet as in SR. There are even 
> implementations offering this.

Except that is actually the problem if you look at it in hardware. And to be 
very specific, I'm talking about commodity hardware, not flexible pipelines 
like you find in the MX and a number of the ASR's. I'm also talking about the 
more recent approach of using Clos in PoP's instead of "big iron" or chassis 
based systems. On those boxes, it's actually better to not do shared labels, as 
this pushes the ECMP decision to the ingress node. That does mean you have to 
enumerate every possible path (or some approximate) through the network, 
however the action on the commodity gear is greatly reduced. It's a pure label 
swap, so you don't run into any egress next-hop problems. You definitely do on 
the ingress nodes. Very, very badly actually.

So you can move to a shared label mode. Now the commodity boxes have to perform 
ECMP. That means they also have to have a unique ECMP group for every 
site/any-cast label passing through them, as every label is being swapped 
differently. You get no reuse for two labels that are on identical paths 
because the "swaps" are not identical. So you hit up against ECMP next-hop 
group starvation, forcing you to lower radix and limiting total any-/site-cast 
count.

> IP at least gives you rewrite sharing, so in a lite-core you have way better 
> trade-off on resources, especially in a heavily ECMP'ed network. Such as one 
> build of massive number of open small boxes vs. a small number of huge opaque 
> ones. Pick your poison but saying one is inheriantly better then another in 
> all cases is just plane false.
> 
> If I understand this argument correctly then it shouldn't be one because of 
> "rewrite sharing" being irrelevant for the addressability of single nodes in 
> a BGP network. Why a header lookup depth of 4B per label in engineered and 
> non-engineered paths should be a bad requisite for h/w designers of modern 
> networks is beyond me. In most MPLS networks (unengineered L3VPN) you need to 
> read less of headers than in a eg. VXLAN fabric to make ECMP work (24B vs. 
> 20B).

What I'm getting at is that IP allows re-write sharing in that what needs to 
change on two IP frames taking the same paths but ultimately reaching different 
destinations are re-written (e.g. DMAC, egress-port) identically. And, at least 
with IPIP, you are able to look at the inner-frame for ECMP calculations. 
Depending on your MPLS design, that may not be the case. If you have too deep 
of a label stack (3-5 depending on ASIC), you can't look at the payload and you 
end up with polarization.

David

_______________________________________________
cisco-nsp mailing list  [email protected]
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Reply via email to