Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

Nikos Leontsinis Wed, 14 Nov 2018 23:25:36 -0800

CoS will not work on the SD ports.

On 15 Nov 2018, at 04:51, Hugo Slabbert <[email protected]> wrote:


>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
> 
> Glib answer: because then it's not spine & leaf anymore ;)
> 
> Less glib answer:
> 
> 1. it's not needed and is suboptimal
> 
> Going with a basic 3-stage (2 layer) spine & leaf, each leaf is connected to 
> each spine.  Connectivity between any two leafs is via any spine to which 
> they are both connected.  Suppose you have 2 spines, spine1 and spine2, and, 
> say, 10 leaf switches. If a given leaf loses its connection to spine1, it 
> would then just reach all other leafs via spine2.
> 
> If you add a connection between two spines, you do create an alternate path, 
> but it's also not an equal cost or optimal path.  If we're going simple least 
> hops / shortest path, provided leaf1's connection to spine1 is lost, in 
> theory leaf2 could reach leaf1 via:
> 
> leaf2 -> spine1 -> spine2 -> leaf1
> 
> ...but that would be a longer path than just going via the remaining:
> 
> leaf2 -> spine2 -> leaf2
> 
> ...path.  You could force it through the longer path, but why?
> 
> 2. What's your oversub?
> 
> The pitch on spine & leaf networks is generally their high bandwith, high 
> availability (lots of links), and low oversubscription ratios.  For the sake 
> of illustration let's go away from chassis gear for spines to a simpler 
> option like, say, 32x100G Tomahawk spines.  The spines there have capacity to 
> connect 32x leaf switches at line rate.  Whatever connections the leaf 
> switches have to the spines do not have any further oversub imposed within 
> the spine layer.
> 
> Now you interconnect your spines.  How many of those 32x 100G ports are you 
> going to dedicate to spine interconnect?  2 links?  If so, you've now dropped 
> the capacity for 2x more leafs in your fabric (and however many compute nodes 
> they were going to connect), and you're also only providing 200G interconnect 
> between spines for 3 Tbps of leaf connection capacity.  Even if you ignore 
> the less optimal path thing from above and try to intentionally force a 
> fallback path on spine:leaf link failure to traverse your spine xconnect, you 
> can impose up to 15:1 oversub in that scenario.
> 
> Or you could kill the oversub and carve out 16x of your 32x spine ports for 
> spine interconnects.  But now you've shrunk your fabric significantly (can 
> only support 16 leaf switches)...and you've done so unnecessarily because the 
> redundancy model is for leafs to use their uplinks through spines directly 
> rather than using inter-spine links.
> 
> 3. >2 spines
> 
> What if we leaf1 loses its connection to spine2 and leafx loses its 
> connection to spine1?  Have we not created a reachability problem?
> 
>     spine1     spine2
>    /               \
>  /                  \
> leaf1              leafx
> 
> Why, yes we have.  The design solution here is either >1 links between each 
> leaf & spine (cheating; blergh) or a greater number of spines.  What's your 
> redundancy factor?  Augment the above to 4x spines and you've significantly 
> shrunk your risk of creating connectivity islands.
> 
> But if you've designed for interconnecting your spines, what do you for 
> interconnecting 4x spines?  What about if you reach 6x spines?  Again: the 
> model is that resilience is achieved at the leaf:spine interconnectivity 
> rather than at the "top of the tree" as you would have in a standard 
> hierarchical, 3-tier-type setup.
> 
> -- 
> Hugo Slabbert       | email, xmpp/jabber: [email protected]
> pgp key: B178313E   | also on Signal
> 
>> On Tue 2018-Nov-06 12:38:22 -0600, Aaron1 <[email protected]> wrote:
>> 
>> This is a timely topic for me as I just got off a con-call yesterday with my 
>> Juniper SE and an SP specialist...
>> 
>> They also recommended EVPN as the way ahead in place of things like fusion.  
>> They even somewhat shy away from MC-lag
>> 
>> This was all while talking about a data center redesign that we are working 
>> on currently.  Replacing ToR VC EX4550’s connected LAG to ASR9K with new 
>> dual QFX5120 leaf to single MX960, dual MPC7E-MRATE
>> 
>> I think we will connect each QFX to each mpc7e card.  Is it best practice to 
>> not interconnect directly between the two QFX’s ? If so why not.
>> 
>> (please forgive, don’t mean to hijack thread, just some good topics going on 
>> here)
>> 
>> Aaron
> _______________________________________________
> juniper-nsp mailing list [email protected]
> https://puck.nether.net/mailman/listinfo/juniper-nsp
_______________________________________________
juniper-nsp mailing list [email protected]
https://puck.nether.net/mailman/listinfo/juniper-nsp

Re: [j-nsp] Interconnecting spines in spine & leaf networks [ was Re: Opinions on fusion provider edge ]

Reply via email to