Eric, Thanks for your comments. I received it earlier as well. We are looking into this and will get back to you shortly.
Saud -----Original Message----- From: Eric Rosen [mailto:[email protected]] Sent: Monday, June 24, 2013 5:51 PM To: Pavan Kurapati Cc: [email protected]; [email protected]; Marco Rodrigues; ASIF, SAUD; [email protected] Subject: Comments on draft-kurapati-dynamicrp-bgpmvpn-00.txt Stig Venaas and I have discussed draft-kurapati-dynamicrp-bgpmvpn-00.text, and together we have prepared the following set of questions and comments. - We are wondering why the draft proposes to use a new SAFI, rather than reusing the MCAST-VPN SAFI. Using a new SAFI does provide a bit more freedom in designing the NLRI, but the draft sticks to the basic NLRI format of the MCAST-VPN SAFI anyway. I don't think the new SAFI would be used on a BGP session unless the MCAST-VPN SAFI is also used on that session, so why not just use the MCAST-VPN SAFI for the new route types? - The draft seems to allow an AFI of "IPv6" to be used together with an IPv4 BSR address, or vice versa. When using the type 1-7 routes of the MCAST-VPN SAFI, the AFI designates the address family being used by the customer. The address family being used by the service provider is inferred from the various length computations that are discussed in RFC 6515. It seems best to stick with that same convention for the new BSR route types. That would mean that the field "BSR Address" must be of the address family identified by the AFI. The address length would then have to be appropriate for that address family, or the Update would be considered malformed. On the other hand, if it were to be decided to use a new SAFI, it might make more sense to dispense with the RFC 6515 hacks altogether and explicitly encode the address family of each address. - For RP addresses and Group addresses the draft proposes to use the "encoded" formats from the PIM spec. These formats contain an octet that identifies the address family. There should be a requirement that the address family as encoded in the "encoded format" be the same as the address family identified in the BGP Update's AFI, and that the lengths be appropriate for that address family. - In the BGP Update, parsing would be simpler if the Length field that precedes an "encoded group format" field or an "encoded unicast address field" contains the length of that field, not the length of the address prefix that appears within the encoded format. - The mention of the "VRF Route Import Extended Community" in section 4.1 should say "VRF Route Import Extended Community" or "VRF Route Import IPv6 Address Specific Extended Community", to cover the case of a SP with an IPv6 infrastructure. (It also needs to be made clear that this applies to the NLRI of sections 4.2 and 4.3 as well.) - What action is to be taken if a BGP Update with an MCAST-VPN-BSR NLRI is received, but there is no BSR-BGP Path attribute? - It's hard to interpret phrases like "the group count for this NLRI is not set". How does one send this attribute without "setting" all its fields? Does "not set" just mean "set to zero", or does it mean only that certain fields are irrelevant to the processing of certain NLRIs. - The draft could use a little table to show which fields affect the processing which received NLRIs: NLRI FragTag RP Count Group Count BSR Parameters Yes No Yes BSM Group Parameters Yes Yes No BSM RP Parameters Yes No No I think this table corresponds to your intentions. Where a particular NLRI/field combination is "No", perhaps what the draft should say is that the field MUST be ignored when processing that type of NLRI. That would allow the ignored field to carry any value, without risk of any interoperability problems. If one only says "SHOULD be ignored", there may be interoperability problems. - Regarding Fragmentation Tags There don't seem to be any clear instructions as to when the fragmentation tag field of the BSR-BGP attribute of a given NLRI actually needs to be changed. As a result, it's difficult to figure out its uses. If some customer is sending fragmented BSMs every minute, one doesn't want to have BGP update all its RP mappings every minute. So just when does the attribute value have to change? Hopefully not too often, or there will be a lot of BGP thrashing. It's difficult to understand why a fragmentation tag field is needed in the BSR-BGP attribute at all. The Group Count and RP Count fields are really what control when an egress PE can send a BSM. If an ingress PE doesn't advertise changes to a groups RP mappings until it has all the mappings for that group (which I think is required in BSR), why can't fragmentation be entirely a local matter (i.e., not communicated across the net)? What are we missing? - Constructing BSMs from the Counts Suppose an ingress PE receives a BSM with 15 RP mappings for a given group. Then it receives another BSM with 15 RP mappings for that group, 10 of which are the same, and 5 of which are different. It seems that if the egress PE receives "withdraw, update, withdraw, update, withdraw, update, withdraw, update, withdraw, update", it could generate five BSMs. Is our understanding correct, or are we missing something? - End of RIB Given that there is almost always a route reflector between the ingress and egress PEs, how is the "End of RIB" marker going to be helpful in deciding when to originate a BSM? - BS_Timeout There seems to be a problem with the following procedure from section 6.2.2 ("Missing BSM"): "Egress PE receiving a withdrawn "BSR Parameters" route (Type-1) MUST still keep the corresponding Type-2 and Type-3 entries. However, it MUST NOT advertise the BSM to the CE without the Type-1 route present. As soon as the Type-1 is withdrawn, BS_Timeout period has to be started at the egress and upon its expiry, all the Type-2 and Type-3 entries MUST be deleted. Say the egress has generated BSM at t=0. At t=1 BS_Period expired at ingress PE and ingress PE did not get the periodic BSM. So, it withdraws type-1 (BSR Parameters). Egress PE has already generated BSM just before the type-1 withdrawal was received. The egress PE skips the next periodic BSM towards the CE. But CE is "off" by BS_Period interval by now. Once the BS_Timeout expires, egress PE removes all the type-2 and type-3 entries. CEs connected to egress PE will remove the same, a whole BS_Period later. Hence, to avoid this issue, once the BS_Timeout expires,an egress PE MUST generate a new BSM towards CE with RP hold time set to "0" for all the type-2 and type-3 entries. This will make the CEs in sinc with the the PEs. After generating the BSM, PE removes all the Type-2 and Type-3 entries as stated above. The problem is the following. The holding times of the individual RP mapping entries may be longer than the BS_Timeout. Typically if BS_Timeout fires, the remaining holding time of an RP mapping entry will be the difference between (a) its holding time as reported in the last received BSM and (b) BS_Timeout. The above seems to set the RP holding times to zero as soon as BS_Timeout expires. The problem with this is that it may cause the RP mappings to timeout before a new BSR can be elected. Perhaps the withdrawal of a BSR parameters route should trigger the transmission of a new BSM that doesn't set the RP-mapping holding times to zero, but that just reduces each RP-mapping holding time by BS_Timeout. Well, that would correct the RP-mapping holding times downstream of an egress PE, but it would also have the side effect of restarting the BS_Timeout at the routers downstream of the egress PE. So that doesn't seem right either. - With regard to the sentence "As soon as the Type-1 is withdrawn, BS_Timeout period has to be started at the egress and upon its expiry, all the Type-2 and Type-3 entries MUST be deleted", it doesn't seem right for an egress PE to remove BGP installed routes based upon the expiry of a local timer.
