Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

Wen Lin Tue, 28 Apr 2020 09:28:22 -0700

I am fine with changing this field to a reserved field.  Each originator MUST 
set the reserved field to zero and the reserved field is not a part of the key 
going forward.  RR propagates the route as it was received.

Thanks,
Wen

From: "Ali Sajassi (sajassi)" <[email protected]>
Date: Tuesday, April 28, 2020 at 12:04 PM
To: John Scudder <[email protected]>
Cc: "Mankamana Mishra (mankamis)" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" 
<[email protected]>, "Jakob Heitz (jheitz)" 
<[email protected]>
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: <[email protected]>
Resent-To: <[email protected]>, <[email protected]>, <[email protected]>, 
<[email protected]>, <[email protected]>
Resent-Date: Tuesday, April 28, 2020 at 12:03 PM

[External Email. Be cautious of content]

Hi John,

To accommodate Jakob’s and Jeff’s comment, we can say:

“This field is not part of the route key. The originator MUST set the reserved 
field to Zero (because this field used to be part of the route key), the 
receiver SHOULD ignore it and if it needs to be propagated, it MUST propagate 
it unchanged”.

Once the new rev is published, we can comment and fine tune the text.

Cheers,
Ali

From: John Scudder <[email protected]>
Date: Tuesday, April 28, 2020 at 8:50 AM
To: Cisco Employee <[email protected]>
Cc: "Mankamana Mishra (mankamis)" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" 
<[email protected]>, "Jakob Heitz (jheitz)" 
<[email protected]>
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

Hi Ali,

Yes, making the field reserved would be fine from my point of view, thanks.

Also to repeat the point that was raised at the mic during the meeting just 
now, there should be some text to the effect of “this field is reserved. It 
MUST be transmitted as zero. Any value MUST be accepted on receipt.”

Jakob added this comment in the WebEx chat: “Perhaps also, if propagated, it 
must be propagated the way it was received. For example RR”. That seems 
reasonable to me for an RR (or other device that propagates routes without 
consuming them, sometimes ASBRs can do this). So maybe this text? Jakob, what 
do you think?

“This field is reserved. It MUST be originated as zero. Any value MUST be 
accepted on receipt.”

I changed “transmitted” to “originated” to try to capture the distinction. I 
guess there might need to be some consideration of what exactly “accepted” 
means. I don’t mean that it should be part of the key, I just mean it shouldn’t 
be considered an error.

In any case this is a small detail to be ironed out in the final text, the 
approach Ali proposes is fine.

—John

On Apr 28, 2020, at 1:08 AM, Ali Sajassi (sajassi) 
<[email protected]<mailto:[email protected]>> wrote:

Hi John,

My objection to using two code points for RT-8 and do a migration from old to 
new is two folds:

  1.  As I explained in my previous emails, there are vendors who have 
implemented both formats based on a single code point after our meeting around 
IETF 106 and I don’t want to have them yet again do another implementation. 
Basically, it is not fair to them.
  2.  One of the fundamental premise of EVPN is minimizing configuration and 
that’s why we have lots of auto-derivation / auto-configuration features in 
EVPN – e.g., auto-derivation of ESIs and auto-derivations of RTs, etc. I really 
don’t want to add any extra configuration knobs for BGP peer as to what format 
it supports considering how much efforts has gone into the EVPN protocol to 
minimize configurations.
AFAIK, there is only one vendor who implemented ONLY the existing format with 
deployments (your vendor) and there is only one vendor who implemented ONLY the 
new format with deployments (my vendor). Other vendors that I know (and we need 
to poll again) have either implemented both formats with a single code point or 
haven’t done the implementation yet.

So, after discussing the situation with my development team today, we are OK 
with using the old format and upgrade our field deployment toward that. This 
way we can avoid using two different code points and all the intricacies that 
comes with it – i.e.,  additional configuration knobs and migration stuff. So, 
we change the “Leave Synchronization Number” field to 4-byte reserved field (to 
be set zero upon transmission and ignored when received). It should also not be 
part of route-key processing of course.

Mankamana will present a few slides on this tomorrow and we will seek WG 
consensus on this. Once we have consensus, then the update of the draft is 
rather easy and we will move quickly on it.

Cheers,
Ali

From: John Scudder <[email protected]<mailto:[email protected]>>
Date: Monday, April 27, 2020 at 8:56 AM
To: Cisco Employee <[email protected]<mailto:[email protected]>>
Cc: "Mankamana Mishra (mankamis)" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>"

<[email protected]<mailto:[email protected]>>
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: <[email protected]<mailto:[email protected]>>
Resent-To: Cisco Employee <[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>
Resent-Date: Monday, April 27, 2020 at 8:56 AM

Hi Ali,

Yes, of course the current sad situation requires operators to exercise extra 
care (“good operational procedures”). The discussion is whether this shall 
continue indefinitely into the future (if we keep using the same code point) or 
if it has a clear end (if we move to a standardized code point).

As far as I can tell your argument for continuing to use code point 8 is “it 
would be extra work to move”. Well, yes. Progress requires effort. It’s not a 
lot of effort but it’s more than nil.

Regarding “we thought we had unanimous agreement” I am surprised to see this 
raised as some kind of justification, considering that the field has been 
present since -01 of the individual draft in October 2016, was in there for 
WGLC, and remains there up to this moment. As far as I can tell the first time 
the proposal to remove the field was discussed in the WG was at the IETF-106 
meeting, and I haven’t seen any consensus (either formal or informal) on the 
mailing list since then. Reviewing the recording of the 106 meeting, most of 
the issues I’ve raised were also raised then, in particular by Jeff Haas and 
Keyur Patel, who took some care to explain why it’s a problem from the BGP 
protocol operation side. Presenting this as a fait accompli based on private 
conversations would be poor form even if the proposal didn’t have technical 
deficiencies that were pointed out months ago.

Just because you can hold things together with spit and baling wire, doesn’t 
mean you should settle for that. I am perplexed as to why you’re so opposed to 
doing what is both the normal and the right thing.

—John

On Apr 26, 2020, at 6:07 PM, Ali Sajassi (sajassi) 
<[email protected]<mailto:[email protected]>> wrote:

Hi John,

I think we need a good operational procedures similar to what we did for RT-4 
regardless of what approach we take because currently we have two deployments 
(by two vendors) that use the RT-8 with two different lengths. And without 
proper procedure, mixing these boxes can cause issues such as BGP session reset 
(which you also pointed out previously). So, I believe we need to have a proper 
procedure while we are upgrading them to interoperate with each other. And for 
interoperability, let me categorize the use of the two different code-points as 
the 3rd option. So for sake of completeness, let me repeat them here:

  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.
  3.  Use a new code point for the new format and the new PEs need to support 
both code points and then deprecate the old code point
If we look at the vendor situation (AFAIK), since IETF in Nov, the vendors that 
have implemented this feature except one, have upgraded their implementation to 
support either both format or both lengths because we thought we had a 
unanimous agreement. So, that means all vendors except one can do option 1 and 
2. Now if we are asking everyone to implement option 3, then that would impose 
additional burden on the vendors that they have already implemented to support 
both formats/length with the same code point. I agree that if we weren’t in the 
current situation, option 3 would have been somewhat cleaner, but at this 
point, if we go with option 3, we will be asking these vendors to do yet 
another implementation.

With regard to my RT-constraint comment, allow me to clarify it as follow: The 
RT-8 is only intended to be exchanged among multi-homing PEs and 99% of 
multi-homing scenarios are dual-homing. Furthermore, the dual-homing PEs are 
from the same vendor. This means when this route is advertised by a PE in an 
EVPN network that has 100 PEs, it uses a route-target that is for only one 
other PE. So, in a network with PE1 to PE100 where PE1 and PE2 are dual-homed 
and PE1 advertised this route, then only PE2 needs to import this route and all 
other PEs need to discard when they receive. So, let’s assume, we have a 
network where PE1 to PE 50 run the old format and the PE51 through PE100 run 
option-2 (e.g., they either support both formats or both lengths). Then, when 
PE1 wants to advertise an RT-8 intended for PE2, it will be received by PE3 to 
PE100 and they will discard the route. Now, we need to make sure if PE100 
advertises a route for PE99 (its dual-homing counterpart) with the new format, 
it doesn’t cause an issue for PE1 to PE50. These PEs can use RT-constraint to 
have the RR only send the routes that they have imports for. So, PE1 will not 
receive the RT-8 route from PE100 to cause it any issue.

Regards,
Ali

From: John Scudder <[email protected]<mailto:[email protected]>>
Date: Sunday, April 26, 2020 at 12:33 PM
To: Cisco Employee <[email protected]<mailto:[email protected]>>
Cc: "Mankamana Mishra (mankamis)" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>"

<[email protected]<mailto:[email protected]>>
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

Hi Ali,

Your option 1 is substantially what I proposed, the sole difference being that 
I propose following normal IETF procedure and moving to a new code point. 
Without moving to a new code point, the only thing standing in the way of a 
catastrophe is luck and good operational procedures, hardly a robust option. 
With moving to a new code point, there’s literally no way to trigger this 
scenario.

It’s the safer thing to do and the right thing to do. The code’s not hard, I’m 
tempted to call it trivial. We do this kind of thing all the time — one code 
point for prestandard, another for the standardized version. I see no downside, 
all upside.

Regarding RT-constrain, I don’t follow your reasoning for how it guarantees 
safety in a mixed network.

—John

On Apr 26, 2020, at 3:21 PM, Ali Sajassi (sajassi) 
<[email protected]<mailto:[email protected]>> wrote:

John,

Thanks for your insightful input and suggestion. We have had other situations 
similar to this in the past and we have resolved them by the consensus and 
without having a “ticking time bomb” to cause a network meltdown. One such 
situation was the need to extend RT-4 to add the originator router’s address 
which changed the length of RT-4 route. At the time there were pre-RFC 
implementation from several vendors already deployed in different networks and 
the vendors decided to go with the new RT-4 format and upgrade to it and making 
sure the interoperability is based on standard RFC and not pre-standard 
version. That worked fine as I and other colleagues from other vendors 
(including yours) are not aware of any issues regarding that update. We have a 
lesser situation in here because of the following implementation status:

  1.  Some vendors have implemented both format
  2.  Some vendors have allowed for both lengths (including my vendor) to avoid 
malformed NLRI. Allowing for both length doesn’t mean supporting both format 
but rather both lengths so that the PE that doesn’t need to import the route, 
doesn’t interpret the old format as malformed.
  3.  Vendors that haven’t implemented it, prefer new format
  4.  AFAIK, there is only a single vendor that implemented the v4-only format

So, based on the current data, I think we can have the following two options 
that IMO are simpler:

  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.

I should just mention that for RT-4 changes that all the vendors did long time 
ago, the approach (1) was adopted.

Regards,
Ali

From: John Scudder <[email protected]<mailto:[email protected]>>
Date: Friday, April 24, 2020 at 3:01 PM
To: "Mankamana Mishra (mankamis)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>"

<[email protected]<mailto:[email protected]>>
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: <[email protected]<mailto:[email protected]>>
Resent-To: Cisco Employee <[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>, 
<[email protected]<mailto:[email protected]>>
Resent-Date: Friday, April 24, 2020 at 3:01 PM

Hi All,

Regarding the proposal to remove the Leave Group Synchronization field from the 
Multicast Leave Synch Route, the current proposal is inadequate. Below I 
discuss why, and provide an alternate suggestion. For those who don’t want to 
read my wall of text, my key motivation is simple:

- The current proposal is a ticking time bomb because it leaves in the field a 
situation where two incompatible implementations can exist undetectably.

And my proposal boils down to two things:

- For the new format NLRI that omits the field, allocate a new code point. 
Deprecate [*] code point 8 going forward.
- Optionally provide a somewhat more sophisticated interworking option for 
backward compatibility.

Nitty-gritty below including considerations for how to transition from code 
point 8 to the TBD code point.

As far as I can tell, there is consensus that the field is not useful. That’s a 
good start. The customary way of dealing with this would be to mark the field 
“reserved”, but evidently there are multiple divergent implementations in the 
field that use different formats for the Multicast Leave Synch Route, some that 
include the field and some that don’t. (I should disclose here that my 
employer’s implementation is in the “include” camp.)

There is an obvious interoperability problem here: BGP implementations are 
required to sanity-check the NLRI they receive (see RFC 4271 section 6.3, RFC 
4760 section 7, and RFC 7606 section 5.3). This checking is required whether or 
not there’s a route target present to cause the router to consume the NLRI, the 
standards require the NLRI to be checked regardless. The consequence of 
malformed NLRI is a session reset. This turns out to be a difficult problem in 
BGP, even though we’ve worked to reduce the number of error cases that require 
a session reset, malformed NLRI are one of the very bad cases we can’t paper 
over. The IDR WG worked on this very hard during the development of RFC 7606, 
it is a real problem. When an implementation expects one NLRI format and 
receives another, that’s a malformed NLRI, and can be expected to cause a 
session reset. To leave this situation in place would be BGP protocol 
malpractice.

As far as I can tell, this means it is only through dumb luck that we have had 
two different NLRI formats in the wild without a network meltdown. This seems 
like a ticking time bomb situation.

The implementations are in the field already, we can’t just stamp our feet and 
say “you should have followed the spec” and make the problem go away. So we 
have to think about how to migrate to one agreed format, whatever it may be. 
(The idea that interoperability concerns can be addressed by simply never 
mixing old and new implementations in the same network can be dismissed out of 
hand. That amounts to “there are no interoperability problems if there’s no 
interoperation”, and are we not a standards organization, and is our goal not 
interoperability?)

Let’s take as a given that the agreed format will end up being the one that 
removes the Leave Group Synchronization field. Since something has to change, 
it may as well be the thing that removes the vestigial field.

The cleanest solution is to keep the format depicted in draft -04 (and its 
predecessors) on code point 8, and to allocate a new code point for the new 
format. The old code point would be deprecated, the new code point would be the 
standardized version. It turns out that moving code points is exactly the 
strategy prescribed (or at least strongly recommended) by RFC 7120 section 3.2:

  If at some point changes that are not backward compatible are
  nonetheless required, a decision needs to be made as to whether
  previously allocated code points must be deprecated (see Section 3.3
  for more information on code point deprecation).  The considerations
  include aspects such as the possibility of existing deployments of
  the older implementations and, hence, the possibility for a collision
  between older and newer implementations in the field.

There are existing deployments of older implementations in our case, of course, 
so this advice applies. Keep in mind that RFC 7120 is the process that was used 
to get code point 8 to begin with, so we pretty much have made a contract to 
follow its recommendations.

Code point migration, from the deprecated value 8 to the TBD standardized 
value, is a little bit of an annoyance but the general methodology is 
well-known; this is not rocket science. It looks something like: new 
implementations have to be able to consume both the old format and the new. By 
default they generate the old. Once the entire network is known to be upgraded 
to an RFC-compliant version, the operator configures them to generate the new. 
In the future, the default can be changed, in the farther future the support 
for the old code point can be removed (this end state tends to be aspirational 
in my experience, but we can dream).

I think this should be the solution that is standardized. It keeps the standard 
as simple as possible and provides the format the WG desires (at least, per the 
email so far).

That still leaves open the question of interoperability between the 
pre-standard implementations currently in the field, the ones that generate 
NLRI that follows the format specced in -04 and the ones that don't. Mankamana 
mentions “RR must accept both” as the interoperability solution; I think this 
is necessary but not sufficient because it still doesn’t protect against the 
potential for catastrophic failure as I discuss in my first few paragraphs. 
Rather, I would say that any implementation that wants to interoperate with 
prestandard versions has to provide a configuration option to tell it what 
version of the NLRI to emit towards any given peer. It can and should still 
consume both, but it has to know what kind to emit. I’m not sure whether this 
needs to go in the standard. Maybe it should go in an appendix.

If the WG likes this approach I’d be glad to send text, if wanted.

Thanks,

—John

[*] Note that “deprecate” basically means “you are encouraged to stop using 
this and start using the standardized code point”. I can find a citation if 
there’s any dispute about this, but mostly, experience has shown me that people 
tend to have funny ideas about this word, so I thought I’d put in a line about 
it.

On Apr 23, 2020, at 2:31 AM, Mankamana Mishra (mankamis) 
<[email protected]<mailto:[email protected]>>
 wrote:

[External Email. Be cautious of content]

All,
Post WGLC  before IETF Singapore it came to our notice that there were 
implementation discrepancies of this draft 
(https://tools.ietf.org/html/draft-ietf-bess-evpn-igmp-mld-proxy-04#section-9.3<https://urldefense.com/v3/__https:/tools.ietf.org/html/draft-ietf-bess-evpn-igmp-mld-proxy-04*section-9.3__;Iw!!NEt6yMaO-gk!Wwfj4O6fXrfitRyou2Z56AntEHyd1ekok0U4vGsCrmLsm0RzvCjL0g0DqObMwA$>).
 Though draft had NLRI definition as

             +--------------------------------------------------+
             |  RD (8 octets)                                   |
             +--------------------------------------------------+
             | Ethernet Segment Identifier (10 octets)          |
             +--------------------------------------------------+
             |  Ethernet Tag ID  (4 octets)                     |
             +--------------------------------------------------+
             |  Multicast Source Length (1 octet)               |
             +--------------------------------------------------+
             |  Multicast Source Address (variable)             |
             +--------------------------------------------------+
             |  Multicast Group Length (1 octet)                |
             +--------------------------------------------------+
             |  Multicast Group Address (Variable)              |
             +--------------------------------------------------+
             |  Originator Router Length (1 octet)              |
             +--------------------------------------------------+
             |  Originator Router Address (variable)            |
             +--------------------------------------------------+
             |  Leave Group Synchronization  # (4 octets)       |
             +--------------------------------------------------+
             |  Maximum Response Time (1 octet)                 |
             +--------------------------------------------------+
             |  Flags (1 octet)                                 |
             +--------------------------------------------------+
Where there was Leave Group Synchronization number as part of NLRI. But two 
implementation were

  1.  With this field as part of NLRI
  2.  Without this field as part of NLRI

Implementation survey As of 2019:
 Since it came to notice that at least there are two implementation which would 
not interop, we did try to take survey of other implementation.  We tried it 
with IETF & Nanog forum. We reached out to some of vendors directly as well. 
And implementation were
Cisco – Without Seq number
Juniper – With Seq number
Arista -  with and without sequence number
Apart from these vendors, we did not get response from any one else who had 
implemented these routes.

Before IETF 106 (Singapore) there were couple of discussion among authors & 
other vendors. And it was evident that there are two implementation which would 
not interop as is.  And Sequence number for IGMP does not have any value or 
need. And majority of vendors were ok to remove this field from NLRI as there 
is no practical use case.  So one of the proposal was to remove the field. And 
to make sure we interop with old version proposal was to

1.      Remove Seq number from NLRI

2.      RR MUST accept both len of NLRI

IETF 106 Update :

These changes were presented in IETF 106 Singapore as well.

Implementation Changes post IETF106, As of Today:

Nokia -  Implemented without Seq number, and RR supports both length
Cisco  - Modified implementation to make sure as RR we support both len
Arista -  Already had this implementation to support both len.

Update in Draft :

             +--------------------------------------------------+
             |  RD (8 octets)                                   |
             +--------------------------------------------------+
             | Ethernet Segment Identifier (10 octets)          |
             +--------------------------------------------------+
             |  Ethernet Tag ID  (4 octets)                     |
             +--------------------------------------------------+
             |  Multicast Source Length (1 octet)               |
             +--------------------------------------------------+
             |  Multicast Source Address (variable)             |
             +--------------------------------------------------+
             |  Multicast Group Length (1 octet)                |
             +--------------------------------------------------+
             |  Multicast Group Address (Variable)              |
             +--------------------------------------------------+
             |  Originator Router Length (1 octet)              |
             +--------------------------------------------------+
             |  Originator Router Address (variable)            |
             +--------------------------------------------------+
             |  Leave Group Synchronization  # (4 octets)       |
             +--------------------------------------------------+
             |  Maximum Response Time (1 octet)                 |
             +--------------------------------------------------+
             |  Flags (1 octet)                                 |
             +--------------------------------------------------+

  1.  Removed Seq number from EVPN route type 8
  2.  Added text stating older version of draft had 4 byte extra and RR MUST  
accept and reflect both length.

WGLC :
It had been discussed with chairs, and agreed upon one more short WGLC once 
changes are posted

Before publishing the draft, we wanted to make sure if there are any other 
vendor have any concern.

Mankamana
_______________________________________________
BESS mailing list
[email protected]<mailto:[email protected]>
https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/bess__;!!NEt6yMaO-gk!Wwfj4O6fXrfitRyou2Z56AntEHyd1ekok0U4vGsCrmLsm0RzvCjL0g0ZRWx4yw$<https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/bess__;!!NEt6yMaO-gk!Wwfj4O6fXrfitRyou2Z56AntEHyd1ekok0U4vGsCrmLsm0RzvCjL0g0ZRWx4yw$>

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

Reply via email to