Comments on draft-kurapati-dynamicrp-bgpmvpn-00.txt

Eric Rosen Mon, 24 Jun 2013 14:52:17 -0700

Stig Venaas and I have discussed draft-kurapati-dynamicrp-bgpmvpn-00.text,
and together we have prepared the following set of questions and comments.


- We are wondering why the draft proposes to use a new SAFI, rather than
  reusing the MCAST-VPN SAFI.
  
  Using a new SAFI does provide a bit more freedom in designing the NLRI,
  but the draft sticks to the basic NLRI format of the MCAST-VPN SAFI
  anyway.  I don't think the new SAFI would be used on a BGP session unless
  the MCAST-VPN SAFI is also used on that session, so why not just use the
  MCAST-VPN SAFI for the new route types?

- The draft seems to allow an AFI of "IPv6" to be used together with an IPv4
  BSR address, or vice versa.

  When using the type 1-7 routes of the MCAST-VPN SAFI, the AFI designates
  the address family being used by the customer.  The address family being
  used by the service provider is inferred from the various length
  computations that are discussed in RFC 6515.  It seems best to stick with
  that same convention for the new BSR route types.  That would mean that
  the field "BSR Address" must be of the address family identified by the
  AFI.  The address length would then have to be appropriate for that
  address family, or the Update would be considered malformed.

  On the other hand, if it were to be decided to use a new SAFI, it might
  make more sense to dispense with the RFC 6515 hacks altogether and
  explicitly encode the address family of each address.

- For RP addresses and Group addresses the draft proposes to use the
  "encoded" formats from the PIM spec.  These formats contain an octet that
  identifies the address family.  There should be a requirement that the
  address family as encoded in the "encoded format" be the same as the
  address family identified in the BGP Update's AFI, and that the lengths be
  appropriate for that address family.

- In the BGP Update, parsing would be simpler if the Length field that
  precedes an "encoded group format" field or an "encoded unicast address
  field" contains the length of that field, not the length of the address
  prefix that appears within the encoded format.

- The mention of the "VRF Route Import Extended Community" in section 4.1
  should say "VRF Route Import Extended Community" or "VRF Route Import IPv6
  Address Specific Extended Community", to cover the case of a SP with an
  IPv6 infrastructure.  (It also needs to be made clear that this applies to
  the NLRI of sections 4.2 and 4.3 as well.)

- What action is to be taken if a BGP Update with an MCAST-VPN-BSR NLRI is
  received, but there is no BSR-BGP Path attribute?

- It's hard to interpret phrases like "the group count for this NLRI is not
  set".  How does one send this attribute without "setting" all its fields?
  Does "not set" just mean "set to zero", or does it mean only that certain
  fields are irrelevant to the processing of certain NLRIs.

- The draft could use a little table to show which fields affect the
  processing which received NLRIs:

          NLRI                          FragTag      RP Count    Group Count    
      

          BSR Parameters                Yes             No          Yes
          
          BSM Group Parameters          Yes             Yes         No

          BSM RP Parameters             Yes             No          No

  I think this table corresponds to your intentions.

  Where a particular NLRI/field combination is "No", perhaps what the draft
  should say is that the field MUST be ignored when processing that type of
  NLRI.  That would allow the ignored field to carry any value, without risk
  of any interoperability problems.  If one only says "SHOULD be ignored",
  there may be interoperability problems.

- Regarding Fragmentation Tags

  There don't seem to be any clear instructions as to when the fragmentation
  tag field of the BSR-BGP attribute of a given NLRI actually needs to be
  changed.  As a result, it's difficult to figure out its uses.  If some
  customer is sending fragmented BSMs every minute, one doesn't want to have
  BGP update all its RP mappings every minute.  So just when does the
  attribute value have to change?  Hopefully not too often, or there will be
  a lot of BGP thrashing.

  It's difficult to understand why a fragmentation tag field is needed in the
  BSR-BGP attribute at all.  The Group Count and RP Count fields are really
  what control when an egress PE can send a BSM.  If an ingress PE doesn't
  advertise changes to a groups RP mappings until it has all the mappings
  for that group (which I think is required in BSR), why can't fragmentation
  be entirely a local matter (i.e., not communicated across the net)?  What
  are we missing?

- Constructing BSMs from the Counts

  Suppose an ingress PE receives a BSM with 15 RP mappings for a given
  group.  Then it receives another BSM with 15 RP mappings for that group,
  10 of which are the same, and 5 of which are different.

  It seems that if the egress PE receives "withdraw, update, withdraw,
  update, withdraw, update, withdraw, update, withdraw, update", it could
  generate five BSMs.  Is our understanding correct, or are we missing
  something?

- End of RIB

  Given that there is almost always a route reflector between the ingress
  and egress PEs, how is the "End of RIB" marker going to be helpful in
  deciding when to originate a BSM?

- BS_Timeout

  There seems to be a problem with the following procedure from section
  6.2.2 ("Missing BSM"):

       "Egress PE receiving a withdrawn "BSR Parameters" route (Type-1)
       MUST still keep the corresponding Type-2 and Type-3 entries.
       However, it MUST NOT advertise the BSM to the CE without the
       Type-1 route present.  As soon as the Type-1 is withdrawn,
       BS_Timeout period has to be started at the egress and upon its
       expiry, all the Type-2 and Type-3 entries MUST be deleted.

       Say the egress has generated BSM at t=0.  At t=1 BS_Period
       expired at ingress PE and ingress PE did not get the periodic
       BSM.  So, it withdraws type-1 (BSR Parameters).  Egress PE has
       already generated BSM just before the type-1 withdrawal was
       received.  The egress PE skips the next periodic BSM towards the
       CE.  But CE is "off" by BS_Period interval by now.  Once the
       BS_Timeout expires, egress PE removes all the type-2 and type-3
       entries.  CEs connected to egress PE will remove the same, a
       whole BS_Period later.  Hence, to avoid this issue, once the
       BS_Timeout expires,an egress PE MUST generate a new BSM towards
       CE with RP hold time set to "0" for all the type-2 and type-3
       entries.  This will make the CEs in sinc with the the PEs.  After
       generating the BSM, PE removes all the Type-2 and Type-3 entries
       as stated above.

   The problem is the following.  The holding times of the individual RP
   mapping entries may be longer than the BS_Timeout.  Typically if
   BS_Timeout fires, the remaining holding time of an RP mapping entry will
   be the difference between (a) its holding time as reported in the last
   received BSM and (b) BS_Timeout.  The above seems to set the RP holding
   times to zero as soon as BS_Timeout expires.  The problem with this is
   that it may cause the RP mappings to timeout before a new BSR can be
   elected.

   Perhaps the withdrawal of a BSR parameters route should trigger the
   transmission of a new BSM that doesn't set the RP-mapping holding times
   to zero, but that just reduces each RP-mapping holding time by
   BS_Timeout.  Well, that would correct the RP-mapping holding times
   downstream of an egress PE, but it would also have the side effect of
   restarting the BS_Timeout at the routers downstream of the egress PE.  So
   that doesn't seem right either.

- With regard to the sentence "As soon as the Type-1 is withdrawn,
  BS_Timeout period has to be started at the egress and upon its expiry, all
  the Type-2 and Type-3 entries MUST be deleted", it doesn't seem right for
  an egress PE to remove BGP installed routes based upon the expiry of a
  local timer.

Comments on draft-kurapati-dynamicrp-bgpmvpn-00.txt

Reply via email to