Resending from the email address i'm subscribed to...

> From: Pedro Marques <[email protected]>
> Subject: Re: draft-marques-l3vpn-mcast-edge-00
> Date: May 28, 2012 8:57:53 PM PDT
> To: Petr Lapukhov <[email protected]>
> Cc: Yiqun Cai <[email protected]>, "[email protected]" 
> <[email protected]>, "[email protected]" <[email protected]>, 
> "[email protected]" <[email protected]>
> 
> 
> On May 27, 2012, at 11:11 PM, Petr Lapukhov wrote:
> 
>> Hi Pedro,
> 
> Petr,
> Thank you for your comments. Answers inline.
> 
>> 
>> Thanks for an interesting read! However, I have some concerns regarding the 
>> problem statement in the document:
>> 
>>> For Clos topologies with multiple stages native multicast support
>>> within the switching infrastructure is both unnecessary and
>>> undesirable.  By definition the Clos network has enough bandwidth to
>>> deliver a packet from any input port to any output port.  Native
>>> multicast support would however make it such that the network would
>>> no longer be non-blocking.  Bringing with it the need to devise
>>> congestion management procedures.
>> 
>> Here they are:
>> 
>> 1) Multicast routing over Clos topology could be non-blocking provided that 
>> some criteria on Clos topology dimensions are met and multicast distribution 
>> tree fan-outs are properly balanced at ingress and middle stages of the Clos 
>> fabric.
> 
> Multicast over a CLOS topology creates congestion management issues. One way 
> to address the problem, in large scale CLOS topologies, is to eliminate 
> native multicast in the fabric. That is an approach taken in several 
> networks, including networks that are fully enclosed in a chassis or set of 
> chassis.
> 
>> 
>> 2) Congestion management in Clos networks would be necessary in any case, 
>> due to statistical multiplexing and possibility of (N -> 1) port traffic 
>> flow.
> 
> In practice, many networks are running CLOS topologies with no congestion 
> management support. The assumption is that if hash based load balancing of 
> flows is "good enough" and if the flows are small compared to link size, that 
> the fabric is non-blocking. This allows one to build very large scale CLOS 
> fabrics with off-the-shelf and/or heterogenous components, where each switch 
> works independently. Congestion management at large scale is a very torny 
> issue…
> 
> I believe that there are several efforts in the IEEE under the umbrella of 
> "data-center ethernet" in order to bring global congestion notification/flow 
> control into a heterogenous environment. It is my understanding that there is 
> a non-trivial number of networks that prefer to operate with simple hash 
> based mechanism.
> 
>> 3) The "ingress unicast replication" in VPN forwarder creates the following 
>> issues:
>> 
>> 3.1) If done at software hypervisor level, it will most likely overload 
>> physical uplink(s) on the server: N replicas sent as opposed to 1 in case of 
>> native multicast
> 
> This is the main rational for this work. One could have started with just 
> plain ingress replication. But in that case the ingress would have to 
> replicate to the full membership of the group. With an edge replication tree, 
> the number of copies is limited to N.
> As with any other network design, it is a question of trade-offs. The authors 
> believe there is a non-trivial number of applications (e.g. discovery) where 
> this is a useful approach.
> 
>> 3.2) If done at hardware switch level (edge of physical Clos topology), it 
>> cannot leverage hardware capabilities for multicast replication, and thus 
>> could be difficult to implement and will stress the switch internal fabric.
> 
> Building hardware with no multicast support can also simplify the hardware 
> design.
> 
>> 
>> 4) If L3 VPN spans WAN for Inter-DC communications, unicast replication 
>> makes any WAN multicast optimization impossible, unless there is a 
>> "translating" WAN gateway that will forward packets as native multicast.
> 
> The document only covers intra-DC scenarios, as of now. For WAN traffic, we 
> do assume that there are systems that support L3VPN multicast as defined 
> currently.
> 
>> 5) Optimizing overlay multicast distribution tree could be difficult, since 
>> underlying network metrics may be hidden from VPN gateways.
> 
> In several practical scenarios i aware of, the intra-DC network has 2 costs 
> points: same rack, different racks. Even in scenarios where there are 
> multiple metrics, the BGP signaling gateway can be made aware of the physical 
> topology of the network. My understanding is that the intra-DC network can be 
> optimized.
> 
>> 
>> I'm reviewing the rest of the document, and hopefully can come up with more 
>> comments later.
> 
> Thank you very much for your attention.
> 
>> 
>> Best regards,
>> 
>> Petr Lapukhov
>> Microsoft
>> 
> 

Reply via email to