Robert,

THX for picking it up.
See inline.
--
Rafal Szarecki


Each ASBR will propagate its best route to on-site RR

That is precisely the moment when I think we need to seriously consider 
consequences.

point #1 - more and more stuff in BGP is opaque to BGP and plays no role in 
best path selection. So if someone really needs any of those information he 
must not use this solution and that should be spelled out very clearly as this 
information will be lost.

[RJS] Well, basic BGP rule is that only single path per prefix is advertised. 
Unless you enable ADD-PATH. Right? So Vanilla behavior of ASBR is select one 
best path and advertise it to RR.  Quite commonly, BGP NH is changed to IP of 
ASBR’s loopback. In my solution is changed to IP od ANH. I do not see how 
proposed solution is going to hide information that are otherwise available.

So let’s consider situation when I would like export all path from ASBR to some 
kind of “Route Controller” for purpose of analytics or EPE. If this controller 
is not my on-site RR, I can still do this w/o any problem via BMP or using 
ADD-PATH. Just do not set BGP NH to ANH in export policy.

Finally let assume we want ADD-PATH from ASBR to RR, for whatever reason. In 
this case indeed BGP NH become important. In proposed solution value of BGP NH 
(ANH) depends on what session given path was learned. So is eBGP session1 has 
associated ANH1 nad eBGP session2 has associated ANH2, then ADD-PATH from ASBR 
to on-site RR will give both path with unique BGP NH values.
Then let assume is eBGP session3 has associated ANH1. In this case indeed only 
path learned form session1 xor session3 will be advertised to RR (plus path 
from session2). Yes, some information is suppressed now. Operator can control 
what could be suppressed by associating same ANH with given set of eBGP 
session, or not. That is configurable. The corner case will be 1:1 mapping eBGP 
session to ANH, what is very similar to keeping BGP NH unchanged (peer IP). 
With notable difference that we can remove ANH from IGP regardless of interface 
state, base on eBGP session state.

point #2 - what is the real problem we are solving ? Full table takes depending 
on the implementation of BGP anywhere from 300-450 MB of RAM. Extra path would 
be another 150 MB. This is all control plane memory so pretty cheap and happily 
fits any x86 box to be placed to act as RR.

[RJS] I agree. Memory footprint at RR is not an issue. Convergence at scale is.
Let assume at site 1 I have 4 ASBRS connected to AS_2 each with 1 sessions, and 
this ASBR learns 300k prefixes form AS_2 and all of then are best from each 
ASBR POV. So 300k path per ASBR, 300k pfx per ASBR, 1 path per prefix per ASBR.
The RR gets 4 x 300k pfx with BGP NH set to ASBR1-2-3-4 loopbacks. And send it 
w/ ADD-PAT to on-site CR
When eBGP session of one of ASBR (say ASBR1) fails, it has to withdraw 300k 
path from RR, and RR need to withdraw 300k path form CR. Untill this is done CR 
will keep sending ¼ oftraffic to ASBR1, and BGP NH == loopback is reachable.
Now if ANH is used, CR sees 4 path per prefix with BGP NH == ANH1-2-3-4 
respectively. When eBGP session of one of ASBR (say ASBR1) fails, it removes 
ANH1 form IGP and start to withdraw 300k path from RR, and RR need to withdraw 
300k path form CR. As soon as CR sees IGP update (ANH1 unreachable) it can mark 
all 300k path that have BGP NH == ANH1 unusable. And stop forwarding to ASBR1. 
If CR runs BGP PIC EDGE it could be sub-second.

The inter-site operation – advertising only one path w/ BGP NH representing 
“set of eBGP sessions from set of ASBRS” is just one more optimization. Let 
call this SP_ANH (Site-Peer ANH in contrast to above discussed ASBR-Peer ANH).

  *   If RR advertise to other sites only one path and BGP NH is loopback of 
one of ASBRs (or ASBR-Peer ANH), then what is convergence in case of this ASBR 
failure?  RR has to send 300k path w/ new BGP NH. Until this is done, remote 
sites will send traffic somewhere elsw. Not best egress site.
  *   If RR advertise to other sites only all 4  path and BGP NH is loopback of 
one of ASBRs (or ASBR-Peer ANH), then IGP update removing this address will 
allow for quick restoration (as other 3 path are available everywhere). But in 
multi-path scenario, we just sreated 2-3 level of ECMP structure on remote BGP 
speakers:
prefix--> (list of 4 BGP NH) --> each BGP NH --> list of IGP ECMP neighbours. 
That costly to manage in S/W and in HW
  *   If RR advertise to other sites only one path and BGP NH is Site-Peer ANH 
as in this proposal, then Site-Peer ANH is not removed form IGP (as other ASBRs 
has session with Peer). Re,mote sites keep sending traffic using pre-failure 
data until BGP update from RR comes. End when it comes, it will have same BGP 
NH as pre-failure path. So there will be no need to update FIB. Also FIB 
structure will be simpler and less costly
prefix--> one  BGP NH --> list of IGP ECMP neighbours.
Some merchant chips have really limited ECMP capability…

Now you made two observations:

-A- I have a weak router which is fine from bw pov but does not have steam to 
handle 5M paths - My take is - ok send him 1 or 2 paths from RRs and be done.

-B- The withdraw of all BGP routes takes soo long - well let's observe that we 
can withdraw all routes from a given peer with single BGP message using 
techniques as described in 
https://tools.ietf.org/html/draft-raszuk-aggr-withdraw-00<https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Draszuk-2Daggr-2Dwithdraw-2D00&d=DwMFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=Hjhzvcy3RXY7GgnXtof0rgOeWXlbs83hVb3_12LdlBA&m=iJUN0T42tjxfPWsB6SlnCrzunCWTBLFel6FkdLOBPvU&s=zc62mnhXDVusW-Gi9_0csHmhhApgkVDLBhiCprjmzPY&e=>

[RJS] ACK. We can always extend protocol to do something. Or develop new one. I 
think I also saw proposal of new NLRI  to advertise “NH invalidation”.
This proposal do not requires any change to BGP. So it can interoperate with 
virtually any implementations.

Bottom line I think using Abstract Next Hop can help for specific SAFis in 
specific topologies to reduce the amount of control plane if that is ever an 
issue. Yes dealing with BGP control plane handling is implementation specific 
and some code does it more efficiently then other.

[RJS] Agree. Draft is focused specific architecture/topology – scale-out 
peering – where operator want and assume ECMP of egress traffic among N x ASBRs 
existing at given site.
Not that ANH has no other possible uses, but this draft is about this use case.

Best,
R.


_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow

Reply via email to