Re: [bess] [Last-Call] Genart last call review of draft-ietf-bess-datacenter-gateway-10

Gyan Mishra Mon, 17 May 2021 01:48:10 -0700

Hi Lars

I met with the authors on Friday 5/14 and we went over my questions and
review of the draft in detail.


I will respond today with a detailed update on the status of my review
based on feedback from the authors from Friday meeting that the draft is in
a “Ready” state with minor updates & recommendations.

Kind Regards

Gyan

On Mon, May 17, 2021 at 4:38 AM Lars Eggert <l...@eggert.org> wrote:

> Gyan, thank you for your review. I have not seen a response from the
> editors to your review yet, and so I'm holding off for the moment on
> entering a ballot for this document.
>
> Authors, would you please respond to Gyan's review?
>
> Thanks,
> Lars
>
>
> > On 2021-4-29, at 8:46, Gyan Mishra via Datatracker <nore...@ietf.org>
> wrote:
> >
> > Reviewer: Gyan Mishra
> > Review result: Not Ready
> >
> > I am the assigned Gen-ART reviewer for this draft. The General Area
> > Review Team (Gen-ART) reviews all IETF documents being processed
> > by the IESG for the IETF Chair.  Please treat these comments just
> > like any other last call comments.
> >
> > For more information, please see the FAQ at
> >
> > <https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.
> >
> > Document: draft-ietf-bess-datacenter-gateway-??
> > Reviewer: Gyan Mishra
> > Review Date: 2021-04-28
> > IETF LC End Date: 2021-04-29
> > IESG Telechat date: Not scheduled for a telechat
> >
> > Summary:
> >   This document defines a mechanism using the BGP Tunnel Encapsulation
> >   attribute to allow each gateway router to advertise the routes to the
> >   prefixes in the Segment Routing domains to which it provides access,
> >   and also to advertise on behalf of each other gateway to the same
> >   Segment Routing domain.
> >
> > This draft needs to provide some more clarity as far as the use case and
> where
> > this would as well as how it would be used and implemented.  From
> reading the
> > specification it appears there are some technical gaps that exist. There
> are
> > some major issues with this draft. I don’t think this draft is ready yet.
> >
> > Major issues:
> >
> > Abstract comments:
> > It is mentioned that the use of Segment Routing within the Data Center.
> Is
> > that a requirement for this specification to work as this is mentioned
> > throughout the draft?  Technically I would think the concept of the
> discovery
> > of the gateways is feasible without the requirement of SR within the Data
> > Center.
> >
> > The concept of load balancing is a bigger issue brought up in this draft
> as the
> > problem statement and what this draft is trying to solve which I will
> address
> > in the introduction comments.
> >
> > Introduction comments:
> > In the introduction the use case is expanded much further to any
> functional
> > edge AS verbiage below.
> >
> > OLD
> >
> >   “SR may also be operated in other domains, such as access networks.
> >   Those domains also need to be connected across backbone networks
> >   through gateways.  For illustrative purposes, consider the Ingress
> >   and Egress SR Domains shown in Figure 1 as separate ASes.  The
> >   various ASes that provide connectivity between the Ingress and Egress
> >   Domains could each be constructed differently and use different
> >   technologies such as IP, MPLS with global table routing native BGP to
> >   the edge, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN”
> >
> > This paragraph expands the use case to any ingress or egress stub domain
> Data
> > Center, Access or any.  If that is the case should the draft name change
> to
> > maybe a “stub edge domain services discovery”.  As this draft can be
> used for
> > any I would not preclude any use case and make the GW discovery open to
> be used
> > for any service GW edge function and change the draft name to something
> more
> > appropriate.
> >
> > This paragraph also states for illustrative purposes which is fine but
> then it
> > expands the overlay/underlay use cases. I believe this use case can only
> be
> > used for any technology that has an overlay/underlay which would
> preclude any
> > use case with just an underlay global table routing such as what is
> mentioned
> > “IP, MPLS with global table routing native BGP to the edge.  The IP or
> global
> > table routing would be an issue as this specification requires setting a
> RT and
> > an export/import RT policy for the discover of routes advertised by the
> GWs.
> > As I don’t think this solution from what I can tell would work
> technically for
> > global table routing I will update the above paragraph to preclude
> global table
> > routing.  We can add back in we can figure that out but I don’t think any
> > public or private operator would change from global table carrying all
> BGP
> > prefixes in the underlay now drastic change to VPN overlay pushing all
> the
> > any-any prefixes into the overlay as that would be a prerequisite to be
> able to
> > use this draft.
> >
> >> From this point forward I am going to assume we are using VPN overlay
> > technology such as SR or MPLS.
> >
> > NEW
> >
> >   “SR may also be operated in other domains, such as access networks.
> >   Those domains also need to be connected across backbone networks
> >   through gateways.  For illustrative purposes, consider the Ingress
> >   and Egress SR Domains shown in Figure 1 as separate ASes.  The
> >   various ASs that provide connectivity between the Ingress and Egress
> >   Domains could be two as shown in Figure-1 or could be many more as
> exists
> >   with the public internet use case, and each may be constructed
> differently
> >   and use different technologies such as MPLS IP VPN, SR-MPLS IP VPN, or
> SRv6
> >   IP VPN” with a “BGP Free” Core.
> >
> > This may work without “BGP Free” core but I think to simplify the design
> > complexity I think constraining to “BGP Free” core transport layer.
> SR-TE path
> > steering as well gets much more complicated if all P routers are running
> BGP as
> > well. I think in this example we can even explicitly say this example
> shows the
> > public internet as that would be one of the primary use cases.
> >
> > This paragraph is confusing to the reader
> >
> > As a precursor to this paragraph I think it maybe a good idea to state
> that we
> > are talking global table IP only routing or VPN overlay technology with
> SR/MPLS
> > underlay transport.  That will make this section much easier to
> understand.
> >
> > Figure 1 drawing you should give a AS number to both the ingress domain
> and
> > egress domain so the reader does not have to make assumptions if it iBGP
> or
> > eBGP connected to the egress or ingress domain and state eBGP in the text
> > below.  Lets also call the intermediate ASNs in the middle as depicted
> in the
> > diagram could be 2 as shown illustratively but could be many operator
> domains
> > such as in the case of traversing the public internet.   In the drawing
> I would
> > replace ASBR for PE as per this solution as I am stating it has to be a
> VPN
> > overlay paradigm and not global routing.  Also in the VPN overlay
> scenario when
> > you are doing any type of inter-as peering the inter-AS peering is almost
> > always between PE’s and not a separate dedicated device serving a special
> > “ASBR-ASBR” function as the PE is acting as the border node providing the
> > “ASBR” type function.  So in the re-write I am assuming the drawing has
> been
> > updated changing ASBR to  PE.  Lets give each node a number so that we
> can be
> > clear in the text exactly what node we are referring to.  In the drawing
> please
> > update that GW1 peers to PE1 and GW2 peers to PE2 and GW3 peers to PE3.
> GW3
> > also peers to GW4 and GW2 peers  to GW5 which GW4 and GW5 are part of
> AS3.  In
> > the AS1-AS2 peering  top peer would be PE6 peers to PE8 and bottom peer
> PE7
> > peers to PE9.  So PE6 and PE7 are in AS1 and PE8 and PE9 are in AS2.  I
> made
> > the bottom to ASBRs in AS3 for the selective deterministic load
> balancing now
> > calling them GW4 and GW5 used later in the problem statement.
> >
> > One major problem with this problem statement description is that it is
> > incorrect as far as GW load balancing that it does not work today in the
> > topology given in Figure-1.  The function of edge GW load balancing is
> based on
> > the iBGP path tie breaker lowest common denominator in the BGP path
> selection
> > which is lowest IGP underlay metric and as long as the metric is equal
> and you
> > have iBGP multipath enabled  you now can load balance to egress PE1 and
> PE2
> > endpoints. So in this case flows coming from AS1 into AS2 hit a P
> intermediate
> > router which has iBGP multipath enabled and has lets say equal cost for
> route
> > to the next hop attribute assuming next-hop-self is set so the cost to
> > loopback0 on PE1 and cost to loopback0 on PE2 is lets say 10, so now you
> have a
> > BGP multipath.  What is required though is the RD has to be unique in a
> “BGP
> > Free” core RR environment where all PE’s route-reflector-clients peer to
> the RR
> > and for all the paths that are advertised to the RR to be reflected to
> all the
> > egress PE edges the RD must be unique for the RR to reflect all paths.
> BGP
> > add-paths is only used if you have Primary and Backup routing setup where
> > PE1-GW1 has a 0x prepend and PE2-GW2 has 1x prepend so now with BGP
> add-paths
> > along with BGP PIC Edge you now have a edge pre-programmed backup path.
> So the
> > add-paths is not necessarily something that helps for load balancing and
> is in
> > fact orthogonal to load balancing as it for Primary / Backup routing and
> not
> > Active/Active load balancing routing where load balancing with VPN
> overlay is
> > simply achieved with unique RD per PE and iBGP multipath and equal cost
> paths
> > to the underlay recursive IGP learned next-hop-attribute in this case
> the PE
> > loopback 0 per the next hop rewrite via “next-hop-sellf” done on the
> PE-RR
> > peering in a standard VPN overlay topology.   As far as load balancing
> being
> > accomplished in the underlay what I have stated is independent of SR-TE
> however
> > with SR-TE candidate path the load balancing ECMP spray to egress PE
> egress GW
> > AS can also happen as well with prefix-sid.
> >
> > OLD
> >   Suppose that there are two gateways, GW1 and GW2 as shown in
> >   Figure 1, for a given egress SR domain and that they each advertise a
> >   route to prefix X which is located within the egress SR domain with
> >   each setting itself as next hop.  One might think that the GWs for X
> >   could be inferred from the routes' next hop fields, but typically it
> >   is not the case that both routes get distributed across the backbone:
> >   rather only the best route, as selected by BGP, is distributed.  This
> >   precludes load balancing flows across both GWs.
> >
> > I am rewriting the text in the NEW as there is some discrepancy in the
> routes
> > being distributed across the backbone and what gets distributed.  So I am
> > completely re-writing to make it more clear what we are trying to state
> here as
> > the text appears technically to be incorrect.  To help state the flow
> will use
> > the BGP route flow to help depict the routing and try to get to the
> problem
> > statement we are trying to portray.
> >
> > NEW
> >
> >   Suppose that there are two gateways, GW1 and GW2 as shown in
> >   Figure 1, for a given egress SR domain and each gateway advertises via
> EBGP
> >   a VPN prefix X to AS2 core domain via EBGP with underlay next hop set
> to GW1
> >   or GW2. In this case we are Active / Active load balancing with PE1
> and PE2
> >   receives the VPN prefix and advertised the VPN prefix X into the
> domain with
> >   next-hop-self set on the PE-RR peering to the PE’s loopback0.  The P
> routers
> >   within the domain have ECMP path with IGP metric tie to the egress PE1
> and
> >   egress PE2 for VPN Prefix X learned from GW1 and GW2. SR-TE path can
> now be
> >   stitched from GW3 to PE3 SR-TE Segment-1 to PE3 to PE6 and PE7
> Segment-2 to
> >   PE8 and PE9 to Egress Domain via PE1 and PE2 to GW1 and GW2.  In this
> case
> >   however we don’t want the traffic to be steered via SR-TE Load
> balanced via
> >   ingress GW3 and want to take GW3 out of rotation and load balance
> traffic to
> >   GW4 and GW5 instead.
> >
> > **Text above provides the updated selective deterministic gateway
> steering
> > described below to achieve the goal.  I think that may have been the
> intent of
> > the authors and I am just making it more clear**
> >
> > As for problem statement as GW load balancing can occur in the underlay
> as
> > stated easily that is not the problem.
> >
> > In my mind I am thinking the problem statement that we want to describe
> in both
> > the Abstract and Introduction is not vanilla simple gateway load
> balancing but
> > rather a predictable deterministic method of selecting gateways to be
> used that
> > is each VPN prefix now has a descriptor attached -  tunnel encapsulation
> > attribute which contains multiple TLVs one or more for each “selected
> gateway”
> > with each tunnel TLV contains an egress tunnel endpoint sub-tlv that
> identifies
> > the gateway for the tunnel.  Maybe we can have in the sub-tlv a priority
> field
> > for pecking order preference of which GWs are pushed up into the GW hash
> > selected for the SR-ERO path to be stitched end to end.   So lets say
> you had
> > 10 GWs and you break them up into 2 tiers or multi tiers and have maybe
> gateway
> > 1-5 are primary and 6-10 are backup and that could be do to various
> reasons so
> > you can basically pick and choose based on priority which GW that gets
> added to
> > the GW hash.
> >
> > I have some feedback and comments on the solution and how best to write
> the
> > verbiage to make it more clear to the reader.
> >
> > I think in the solution as far s the RT to attach for the GW auto
> discovery.
> > So with this new RT we are essentially creating a new VPN RIB that has
> prefixes
> > from all the selected gateways that are discovered from the tunnel
> > encapsulation attribute TLV.
> >
> > In the text here what is really confusing is if the tunnel encapsulation
> > attribute is being attached to the underlay recursive route to next hop
> > attribute or the VPN overlay prefix.   So the reason I am thinking it is
> being
> > attached to the VPN overlay prefix and not the underlay next hop
> attribute is
> > how would you now create another transport RIB and if you are creating a
> new
> > transport RIB there is already a draft defined by Kaliraj Vairavakkalai
> or
> > BGP-LU SAFI 4 labeled unicast that exits today to advertise next hops
> between
> > domains for an end to end LSP load balanced path.
> >
> >
> https://tools.ietf.org/html/draft-kaliraj-idr-bgp-classful-transport-planes-07
> >
> > IANA code point below
> > 76      Classful-Transport SAFI
> > [draft-kaliraj-idr-bgp-classful-transport-planes-00]
> >
> > Also in line with CT another option is BGP-LU SAFI 4 to import the
> loopbacks
> > between domains which is the next hop attribute to be advertised into
> the core
> > end to end LSP.  So the BGP-LU SAFI  RIB could be used for the next GW
> next hop
> > advertisement between domains so that there is visibility of all the
> egress PE
> > loopback0 between domains.   So you can either stitch the LSP segmented
> LSP
> > like inter-as option-b SR-TE stitched and use nex-hop self PE-RR next-hop
> > rewrite on each of the PEs within the internet domain or you could
> import all
> > the PE loopback from all ingress and egress domains into the internet
> domain
> > similar to inter-as opt-c create end to end LSP instantiate an end to
> end SR-TE
> > path.
> >
> > Maybe you could attach the RT tunnel encapsulation attribute tunnel tlv
> > endpoint tlv to the VPN overlay prefix.  Not sure how that would be
> beneficial
> > the underlay steers the VPN overlay.
> >
> > So maybe you could couple the VPN overlay new GW RIB RT to the transport
> > Underlay CT CLAS RIB or BGP-LU RIB coupling  may have some benefit but
> that
> > would have to be investigated but I think is out of scope of the goals
> of this
> > draft.
> >
> > I think we first have to figure out the goal and purpose of this draft
> by the
> > authors and how the GW discovery should work in light of the CT class CT
> RIB
> > AFI/SAFI codepoint draft that exists today as well as the BGP-LU option
> for
> > next hop advertisement within the internet domain.
> >
> > Section 3 comments
> >
> >      “Each GW is configured with an identifier for the SR domain.  That
> >      identifier is common across all GWs to the domain (i.e., the same
> >      identifier is used by all GWs to the same SR domain), and unique
> >      across all SR domains that are connected (i.e., across all GWs to
> >      all SR domains that are interconnected).
> >
> > **No issues with the above**
> >
> >      A route target ([RFC4360]) is attached to each GW's auto-discovery
> >      route and has its value set to the SR domain identifier.
> >
> > **So here if the RT is attached to the GW auto-discovery route we need
> to state
> > is that the underlay route and that the PE does a next-hop-self rewrite
> of the
> > eBGP link to the BGP egress domain next hop to the loopback0 so the GW
> next hop
> > that we are tracking of all the ingress and egress PE domains is the
> egress and
> > ingress PE loopback0.**
> >
> >      Each GW constructs an import filtering rule to import any route
> >      that carries a route target with the same SR domain identifier
> >      that the GW itself uses.  This means that only these GWs will
> >      import those routes, and that all GWs to the same SR domain will
> >      import each other's routes and will learn (auto-discover) the
> >      current set of active GWs for the SR domain.”
> >
> > **So if this is the case and we are tracking the underlay RIB and attach
> a
> > route target to all the ingress PE & P next hops which is loopback0 =
> this is
> > literally identical to BGP-LU importing all the loopbacks between
> domains or
> > using CT class** There is no need for this feature to use the tunnel
> > encapsulation attribute.  I am not following why you would not use
> BGP-LU or CT
> > clas RIB.**
> >
> >   “To avoid the side effect of applying the Tunnel Encapsulation
> >   attribute to any packet that is addressed to the GW itself, the GW
> >   SHOULD use a different loopback address for packets intended for it.”
> >
> > **I don’t understand this statement as the next hop is the ingress and
> egress
> > PE loopback0 that is the next hop being tracked for the gateway load
> balancing.
> > The GW device subnet between the GW and PE is not advertised into the
> internet
> > domain as we do next-hop-self on the PE PE-RR iBGP peering and so the GW
> to PE
> > subnet is not advertised.**   Looking at it a second time I think we are
> > thinking here BGP-LU inter-as opt c style import of loops between
> domains and
> > so instead of importing the loop0 which carries all packets on the GW
> device
> > use a different loopback GW1 so it does not carry the FEC of all  BAU
> packets
> > similar concept utilized in RSVP-TE to VPN mapping "per-vrf TE" concept.
> >
> >   “As described in Section 1, each GW will include a Tunnel
> >   Encapsulation attribute with the GW encapsulation information for
> >   each of the SR domain's active GWs (including itself) in every route
> >   advertised externally to that SR domain.  As the current set of
> >   active GWs changes (due to the addition of a new GW or the failure/
> >   removal of an existing GW) each externally advertised route will be
> >   re-advertised with a new Tunnel Encapsulation attribute which
> >   reflects current set of active GWs.”
> >
> > **What is the route being advertised externally from the GW.  So the
> routes
> > advertised would be all the PE loopback would be advertised from both
> ingress
> > and egress domains into the internet domain and all loopback from the
> internet
> > domain into the ingress and egress domain which could be done via BGP-LU
> or CT
> > RIB – no need do reinvent the wheel and create a new RIB.  So BGP-LU or
> CT RIB
> > track the current set of active next hop GWs loopbacks between
> domains**If you
> > do SR-TE stitching then you can do the next-hop self on each PE PE-RR
> for the
> > load balancing and that would work and the load balancing would be to
> the PE
> > loopbacks or if its an end to end SR-TE path using BGP-LU or CT RIB via
> > importing all the PE loopbacks between domains the current set of active
> GWs
> > would be tracked via the BGP-LU or CT RIB.  So if the active GWs change
> due to
> > GW failures they would be withdrawn from the BGP-LU or CT underlay RIB.
> No
> > need now for the tunnel encapsulation attribute at least for the GW auto
> > discovery load balancing**
> >
> > I think it still maybe possible to retrofit this draft to utilize the CT
> RIB or
> > BGP-LU for the GW load balancing so nothing new has to be designed as
> far as
> > the underlay goes, however maybe the idea of providing some visibility
> into the
> > VPN overlay route to the underlay – maybe their maybe some benefit of
> using the
> > tunnel encapsulation attribute RT import policy to attach to the VPN
> overlay
> > prefixes.
> >
> > As CT draft provides a complete solution of providing the VPN overlay
> per VPN
> > or per prefix underpinning of the VPN overlay to underlay CT RIB the
> problem
> > statement is completely solved with either the CT draft or BGP-LU.
> >
> > Minor issues:
> > None
> >
> > Nits/editorial comments:
> >
> > Please add normative and informative references below.
> >
> > I would reference as normative and maybe even informative the CT Class
> draft
> > which creates a new transport class and I think this draft can really
> work well
> > in conjunction with use of the CT class to couple the GW RIB created to
> the CT
> > class transport RIB and provide the end to end inter-AS stitching via
> the PCE
> > CC controller.  I am one of the co-authors of this draft and I think
> this draft
> > could be coupled with this GW draft to provide the overall goals of
> selective
> > GW load balancing.
> >
> >
> https://tools.ietf.org/html/draft-kaliraj-idr-bgp-classful-transport-planes-07
> >
> > I would also reference this draft for CT class PCEP coloring extension.
> >
> > https://tools.ietf.org/html/draft-rajagopalan-pcep-rsvp-color-00
> >
> > As this solution would utilize a centralized controller PCE CC for inter
> as
> > path instantiation for the GW load balancing, I think it would be a good
> idea
> > to reference the PCE CC, H-PCE and Inter-AS PCE and PCE SR extension as
> > informative and maybe even normative reference.
> >
> >
> >
> > --
> > last-call mailing list
> > last-c...@ietf.org
> > https://www.ietf.org/mailman/listinfo/last-call
>
> --

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *

*Email gyan.s.mis...@verizon.com <gyan.s.mis...@verizon.com>*



*M 301 502-1347*

_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] [Last-Call] Genart last call review of draft-ietf-bess-datacenter-gateway-10

Reply via email to