Re: [bess] Request discussion on Cumulative Link Bandwidth Draft

Satya Mohanty (satyamoh) Wed, 07 Jul 2021 15:52:47 -0700

Gyan,

Thanks for your interest. Sorry for replying late on this.
Thanks Arie and Jeff for your clarifications.


We have not changed the definition of the Link Bandwidth Ext. Community 
(0x4004) or it definition as non-transitive.
As Arie mentioned previously, we are actually originating it at the AS boundary.

BTW, the cumulative link-bandwidth feature first went in XR in 6.1.1. We can 
incorporate that in the addendum as well get the similar information from other 
vendors.

Thanks,
--Satya

From: Gyan Mishra <[email protected]>
Date: Saturday, July 3, 2021 at 8:34 AM
To: Jeff Tantsura <[email protected]>
Cc: Arie Vayner <[email protected]>, Robert Raszuk <[email protected]>, "Satya 
Mohanty (satyamoh)" <[email protected]>, "UTTARO, JAMES" <[email protected]>, 
"[email protected]" <[email protected]>
Subject: Re: [bess] Request discussion on Cumulative Link Bandwidth Draft


Thanks Jeff for the clarification.

Thanks

Gyan
On Fri, Jul 2, 2021 at 6:05 PM Jeff Tantsura 
<[email protected]<mailto:[email protected]>> wrote:
Gyan,

you are mixing use of BW community as such with cumulative propagation (the 
theme of the draft).
The original community is defined in "Non-Transitive Two-Octet AS-Specific 
Extended Community Sub-Types" IANA section and that has to be changed to allow 
eBGP use cases.
Aggregation is a very useful feature when the prefix with the community 
attached is traversing the fabric and is being regenerated and potentially 
transformed at every hop traversed.

The alternative with add-path and potentially path explosion (not to talk about 
operational complexity of add-path and bugs in the implementations) is a rather 
unattractive solution for DC fabrics.

Cheers,
Jeff

On Fri, Jul 2, 2021 at 12:17 PM Gyan Mishra 
<[email protected]<mailto:[email protected]>> wrote:
Hi Satya

For EBGP DCs with multi-stage clos I understand this can be used, however with 
Cisco & Juniper & Nokia & Arista  and maybe other vendor implementations seem 
to combine the non transitive link bw extended community and transitive 
cumulative link bw extended community into a single feature that works for UCMP 
intra AS and inter AS.  Please confirm.

These two drafts seem to be combined by implementations into a single UCMP 
feature for both eBGP and iBGP.

https://datatracker.ietf.org/doc/html/draft-ietf-idr-link-bandwidth-07

https://datatracker.ietf.org/doc/html/draft-mohanty-bess-ebgp-dmz-03

Cisco:

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_bgp/configuration/xe-17/irg-xe-17-book/bgp-link-bandwidth.html

https://community.cisco.com/t5/service-providers-documents/asr9000-xr-understanding-unequal-cost-multipath-ucmp-dmz-link/ta-p/3138853

Juniper:

https://www.juniper.net/documentation/en_US/junos/topics/concept/bgp-multipath-unequal-understanding.html

Nokia:

https://documentation.nokia.com/html/0_add-h-f/93-0074-HTML/7750_SR_OS_Routing_Protols_Guide/bgp.html


Arista:

https://www.arista.com/en/um-eos/eos-border-gateway-protocol-bgp#xx1418621

Kind Regards

Gyan

On Wed, May 26, 2021 at 3:21 PM Satya Mohanty (satyamoh) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jim,

No, they do not.

This draft under discussion is a way to aggregate the link bandwidth in EBGP 
DCs and advertise it upstream.
It works well in multi-stage clos topology fabrics.
Traffic is demultiplexed (multi-path load balanced) when it arrives at a node 
of each stage (unless the sink).

The draft you are mentioning, 
https://tools.ietf.org/id/draft-ietf-bess-evpn-unequal-lb-06.html is really a 
way to communicate the link-bandwidth across EBGP boundaries.
It is mostly geared from an Inter-AS Option C viewpoint (next-hop unchanged) 
although it also applies to Option B deployments (next-hop-self).
There is no notion of aggregating bandwidth here.

HTH.

Best Regards,
--Satya


From: "UTTARO, JAMES" <[email protected]<mailto:[email protected]>>
Date: Wednesday, May 26, 2021 at 5:38 AM
To: Gyan Mishra <[email protected]<mailto:[email protected]>>, Robert 
Raszuk <[email protected]<mailto:[email protected]>>
Cc: Jeff Tantsura <[email protected]<mailto:[email protected]>>, 
Arie Vayner <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, 
"Satya Mohanty (satyamoh)" <[email protected]<mailto:[email protected]>>
Subject: RE: [bess] Request discussion on Cumulative Link Bandwidth Draft

Does this work and Weighted Multi-Path Procedures for EVPN Multi-homing address 
the same field of use?

Thanks,
              Jim Uttaro

From: BESS <[email protected]<mailto:[email protected]>> On Behalf Of 
Gyan Mishra
Sent: Wednesday, May 26, 2021 12:57 AM
To: Robert Raszuk <[email protected]<mailto:[email protected]>>
Cc: Jeff Tantsura <[email protected]<mailto:[email protected]>>; 
Arie Vayner <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Satya Mohanty (satyamoh) 
<[email protected]<mailto:[email protected]>>
Subject: Re: [bess] Request discussion on Cumulative Link Bandwidth Draft

Across the DC space in general most providers use NVO3 and vxlan source port 
entropy L2/L3/L4 hash which provides per packet uniform 50/50 load balancing at 
the L2 VNI overlay layer, which translates into underlay load balancing of 
flows and thus no polarization.

Across the DC space speaking from an operators perspective as under the floor 
fiber is not at a premium compare to 100G facilities costs the net addition of 
bandwidth can be done fairly quickly so you are ahead of the congestion curve 
and can be proactive versus reactively upgrading bandwidth piecemeal here and 
there ad hoc.

There still maybe cases that still arise as even if you have the fiber 
infrastructure available, it’s not easy to upgrade and flash every link 
simultaneously in the DC in one or multiple maintenance windows, so you could 
still be left with some uneven bandwidth across the DC that could utilize this 
feature.

DC comes into play for PE-CE “wan links”as well  use case for unequal cost load 
balancing use of the cumulative link bandwidth community regenerated.


I think the use case where both the iBGP P core P-P links  or eBGP PE - PE 
inter-as are all wan links where link upgrades tend to not get done in unison 
uniformly, and in those cases both iBGP link bandwidth community can be heavily 
utilized as well as eBGP cumulative regenerated link bandwidth community for 
unequal cost  load balancing.  Across the core as well it is hard to flash all 
links even under floor fiber to the same bandwidth all at once you are left 
with the requirement for unequal coat load balancing.

As operators upgrade their DC as well as core infrastructure to IPv6 forwarding 
plane in the move towards SRv6, they can now take advantage of flow label 
entropy stateless uniform load balancing and elimination of polarization.  
However, the wan link upgrades of core and DC PE-CE still exists and thus may 
be done piecemeal, so then both of the drafts are an extremely helpful tool for 
operators that much need the unequal cost load balancing capability.

I support both drafts.

Have most vendors implemented this to support both 2 byte and 4 byte AS 
extended community.  The drafts state 2 byte AS support.

Thanks

Gyan

On Tue, May 25, 2021 at 7:00 PM Robert Raszuk 
<[email protected]<mailto:[email protected]>> wrote:
Hi Arie,

Draft  draft-ietf-idr-link-bandwidth talks about advertising towards IBGP. It 
does not talk about advertising over EBGP.

While I do support your use case I think it would be much cleaner to just ask 
for new ext. community type.

Reason being that as you illustrate you may want to accumulate BGP path's bw 
across few EBGP hops in the DC. Today there is no way to do so unless you want 
to completely hijack current lb ext community.

Also I see an analogy here to AIGP RFC although it clearly fits rather poorly 
for those who use BGP as IGP :).

Best, R.

On Wed, May 26, 2021 at 12:22 AM Arie Vayner 
<[email protected]<mailto:[email protected]>> wrote:
Jeff,

Actually, the way this draft is written, and how the implementations I'm aware 
of are implemented, this is not really a transitive community. It is a new 
community that is being generated on the AS boundary.
The community value is not carried over, but is calculated based on an 
cumulative value of other received communities, and then advertised as a new 
value across the AS boundary.

Tnx,
Arie

On Tue, May 25, 2021 at 12:55 PM Jeff Tantsura 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I support adoption of the draft as Informational, please note, that request to 
change transitivity characteristics of the community is requested in another 
draft.
Gyan  - please note, while pretty much every vendor has implemented the 
community and relevant data-plane constructs, initial draft defines the 
community as non transititive, some vendors have followed that while some other 
have implemented it a transitive (to support obvious use case - eBGP in DC).


Cheers,
Jeff
On May 22, 2021, 8:38 AM -0700, Satya Mohanty (satyamoh) 
<[email protected]<mailto:[email protected]>>, wrote:

Hi all,

On behalf of all the authors, we request a discussion of the draft 
https://datatracker.ietf.org/doc/html/draft-mohanty-bess-ebgp-dmz-03<https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/html/draft-mohanty-bess-ebgp-dmz-03__;!!BhdT!0b982PpfH6BN3cZfnleSv0ex_IJjKEMUOXi42a_RhyGg6nB17aWOnm6zLkY$>
  and subsequent WG adoption.
This draft extends the usage of the DMZ link bandwidth to scenarios where the 
cumulative link bandwidth needs to be advertised to a BGP speaker.
Additionally, there is provision to send the link bandwidth extended community 
to EBGP speakers via configurable knobs. Please refer to section 3 and 4 for 
the use cases.

This feature has multiple-vendor implementations and has been deployed by 
several customers in their networks.

Best Regards,
--Satya
_______________________________________________
BESS mailing list
[email protected]<mailto:[email protected]>
https://www.ietf.org/mailman/listinfo/bess<https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/bess__;!!BhdT!0b982PpfH6BN3cZfnleSv0ex_IJjKEMUOXi42a_RhyGg6nB17aWO_sLM9KM$>
_______________________________________________
BESS mailing list
[email protected]<mailto:[email protected]>
https://www.ietf.org/mailman/listinfo/bess<https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/bess__;!!BhdT!0b982PpfH6BN3cZfnleSv0ex_IJjKEMUOXi42a_RhyGg6nB17aWO_sLM9KM$>
_______________________________________________
BESS mailing list
[email protected]<mailto:[email protected]>
https://www.ietf.org/mailman/listinfo/bess<https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/bess__;!!BhdT!0b982PpfH6BN3cZfnleSv0ex_IJjKEMUOXi42a_RhyGg6nB17aWO_sLM9KM$>
--

Error! Filename not 
specified.<https://urldefense.com/v3/__http:/www.verizon.com/__;!!BhdT!0b982PpfH6BN3cZfnleSv0ex_IJjKEMUOXi42a_RhyGg6nB17aWORYETg1w$>

Gyan Mishra

Network Solutions Architect

Email [email protected]<mailto:[email protected]>

M 301 502-1347

--

[Image removed by sender.]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email [email protected]<mailto:[email protected]>

M 301 502-1347

--

[Image removed by sender.]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email [email protected]<mailto:[email protected]>

M 301 502-1347

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] Request discussion on Cumulative Link Bandwidth Draft

Reply via email to