Hi Eric,
2015-09-28, Eric C Rosen:
From the draft:
"This document does not provide any new protocol elements or
procedures"
I think we can agree that it does not specify any new protocol elements.
> [Thomas] Sections 3, 4.1.1 and 9, at least, introduce what I think
can fairly be considered new procedures.
I don't see anything in section 3 or 4.1.1 that I would call "new
procedures".
However, your point is well-taken about section 9, as RFC6514 does not
really address the use of timers to achieve "make before break"
functionality. On the other hand, RFC 6513 section 7 does specify the
use of timers when switching a flow from one P-tunnel to another, so the
use of timers is not a new addition.
When we started implementing ingress replication, we found that it
wasn't always very clear how to apply the procedures of RFC6514 when
ingress replication is being used. The purpose of this draft is to pull
together into one place all the procedures relevant to ingress
replication, and to explain clearly how ingress replication is done
using the procedures of RFC6514. The focus is on getting it clear
enough to increase the likelihood of multi-vendor interoperability. We
really tried hard to avoid creating any new IR-specific procedures,
though section 9 may be an exception.
And I fully agree that the specs do fit this intention, but one
exception is enough to make the assertion wrong.
I would suggest to distinguish intent and strict truth, e.g. by
replacing the quoted sentence by "To bring the required clarifications,
this document updates the behavior specified by RFC6514, but does so
without introducing new protocol elements or any fundamentally new
procedures". Or something along these lines.
From the draft:
"4.1. Advertised P-tunnels The procedures in this section apply
when the P-tunnel to be joined has been advertised in an S-PMSI A-D
route, an Inter-AS I-PMSI A-D route, or an Intra-AS I-PMSI A-D route."
> For sake of clarity and avoid any misinterpretation, can you please
add ", and the PMSI Tunnel Attribute is of type Ingress Replication"
Well, section 4 is called "How to Join an IR P-tunnel", and the entire
draft is exclusively about IR P-tunnels. If you think that is not
clear, perhaps the sentence above should just say "when the IR P-tunnel
to be joined has been ..."
Yes, that would be just fine.
From the draft:
"Note that if a set of IR P-tunnels is joined in this manner, the
"discard from the wrong PE" procedures of [RFC6513] section 9.1.1 cannot
be applied to that P-tunnel. Thus duplicate prevention on such IR
P-tunnels requires the use of either Single Forwarder Selection
([RFC6513] section 9.1.2) or native PIM procedures ([RFC6513] section
9.1.3).
[Thomas] I would suggest rewording with "Note that, in the general case,
..." and "...unless the tunneling technique relies on an IP transport,
which may allow the identification of the PE sourcing the traffic".
It is certainly true in theory that one could use an IP encapsulation in
this way, but in practice it creates a couple of complications:
- I think it presupposes that the IP source address field of the
tunneled packets contains the same IP address that the ingress PE puts
in the Global Administrator field of the VRF Route Import EC that it
attaches to the unicast routes that it distributes.
(I guess it could use a different one and be made to advertise which one
to expect in a BGP attribute.)
- All the egress PEs need to implement this IP address check in the data
plane forwarding path.
Yes, and this is already true in RFC6513.
While using the IP encapsulation in this way is a possible option, it
has never seemed like a very attractive option, and as far as I know, no
one has implemented it.
To avoid the need for an option like this, I always recommend that if
one wants to use IR by default, one should advertise the IR P-tunnels in
a (C-*,C-*) S-PMSI A-D route rather than in an Intra-AS I-PMSI A-D
route. One can still use IP tunnels if one wants, but the "discard from
the wrong PE" procedures would be based on the MPLS label that is
carried by the IP payload.
I would tend to agree that the choice made makes sense.
It is however better to not make it look like the only possible design
choice ("'discard from the wrong PE' procedures of [RFC6513] section
9.1.1 cannot be applied to that P-tunnel" is ), to avoid misleading
future readers.
I think that at least "[procedure xyz] cannot be applied to that
P-tunnel, in the general case," would be better.
Another problem with using the IP header to apply the "discard from the
wrong PE" procedure is that it will not easily generalize to the case of
extranet. (Still another problem would be that it is just one more
unnecessary option.)
I could add some text explaining this, and explaining why it is not
recommended to use the IP header to apply the "discard from the wrong
PE" procedure.
Yes, this would be useful to document in one or two paragraphs in an
Appendix for instance.
Now, regarding the use of timers when switching UMH ...
[Thomas] I understand -- even if that is a bit implicit -- that the NLRI
for the Leaf A-D route to the old UMH is the same as the NLRI for the
Leaf A-D route to the new UMH.
Correct.
See below, there is a lot of implicit in the sentence as currently
written. Not enough for me to understand correctly on a first reading.
[Thomas] But I don't in fact understand why this has to be the case...
Leaf A-D routes are originated in response to I/S-PMSI A-D routes, and
the rules for creating the NLRI of a Leaf A-D route, as specified in
RFC6514, are independent of the tunnel type.
I agree with that.
[Thomas] One has to ignore the procedures to build a Leaf A-D route of
RFC6514 since this document specifies new ones for IR in section 4.1.1
I don't understand why you say that. The 4.1.1 rules for generating the
NLRI of a Leaf A-D route follow the RFC6514 procedures.
(see below)
[Thomas] section 4.1.1 says that the Key field of the Leaf A-D route
contains the "tunnel identifier" defined in section 3
Yes; the tunnel identifier defined in section 3 is the NLRI of the
corresponding I/S-PMSI A-D route, which is exactly that RFC6514
specifies for the route key.
(see below)
[Thomas] section 3 says that (when the "Leaf info required" bit is set,
which is the case for section 4.1.1) the tunnel identifier is
RECOMMENDED to be a routable address of the router that built the PTA
No; section 3 says that the "tunnel identifier" field of the PTA is
recommended to be a routable address of the router that built the PTA.
But section 3 also tries to make it clear that the identifier of the IR
P-tunnel does not appear in the tunnel identifier field of the PTA.
I have re-read section 3 and now got why I had initially misunderstood
section 4.1.1. Section 3 does in fact say that ''the identifier of an IR
P-tunnel is not the "Tunnel Identifier" the PTA'', which is pretty close
to "the tunnel identifier is not the tunnel identifier".
When you read Section 4.1.1, the phrase "MUST contain the tunnel
identifier (as defined in Section 3 above)" might be misunderstood,
especially because this time "IR P-tunnel identifier" has become juste
"tunnel identifier" (might be read as Tunnel Identifier with the missing
uppercase). All this being made even more likely that one may had in
mind that "MANDATORY" wording is most often related to new things that
one has to be careful about rather than a mere repeat of an existing spec.
I would suggest the following wording:
Current text:
Once the UMH is determined, the router joining the IR P-tunnel
originates a Leaf A-D route. The NLRI of the Leaf A-D route MUST
contain the tunnel identifier (as defined in Section 3 above) as its
"route key".
Proposal:
Once the UMH is determined, the router joining the IR P-tunnel
originates a Leaf A-D route following the procedures in RFC65414;
i.e. the NLRI of the Leaf A-D route MUST is set to the NLRI of
the route triggering the join (which happens to be the IR P-tunnel
identifier, as defined in section 3, and distinct from the PTA
Tunnel Identifier field).
[Thomas] Anyhow, it seems to me that ensuring that the Key changes when
the UMH changes, would simplify the make before break procedure:
everything is at the hand of the downstream PE which can advertise both
routes for as long as it wishes,
That does not seem to me to be a simplification. The specified
procedure is pretty simple:
- To change parents, only a single control plane operation is needed: a
change in the RT of the Leaf A-D route.
Note that I haven't implied anywhere that re-originating a new route
would be of a problematic complexity.
After a thorough re-reading of section 3, I understand now only why I
initially totally misunderstood why "only a change in the RT of the Leaf
A-D route is needed".
Let me suggest a rewording that may avoid other readers to be lost as I
was...
Current text:
Suppose a child node has joined a particular IR P-tunnel via a
particular UMH, and it now determines (for whatever reason) that it
needs to change its UMH on that P-tunnel.
There is in fact a lot of implicit in this sentence: "joined ... via"
and "a particular P-tunnel"/"that P-tunnel" refer to the particulars in
sections 3 and 4.1.1.
Proposal:
Suppose a child node has joined a particular IR P-tunnel via a
particular UMH (following procedures in section 4), and it now
determines (for whatever reason) that it needs to change its UMH
on that P-tunnel (same tunnel identifier as defined in Section 3).
This can for instance arise on a change of UMH for a intermediate
node in a deployment where segmented trees are used.
- In both the upstream and the downstream node, the to-be-deleted data
plane state is timed out.
- There are no data-driven state changes. (Note that to avoid
data-driven state changes, the downstream node really needs to run a
timer in order to decide when to modify its data plane state.)
- The timers do not need to be very precisely tuned, and certainly do
not need to be tuned on a per-peer basis.
- We retain the RFC6514 principle of keeping the NLRI independent of the
tunnel type. Thus we minimize the chances of creating unintended
side-effects or new corner cases that need to be thought out. That is,
we minimize the chances of breaking existing MVPN implementations in
unanticipated ways.
The above is a very precise refutal of issues that I hadn't even raised.
If PETA was taking care of strawmen, I would certainly alert them at
once ;)
You have left uncommented the one reason I had given to illustrate the
complexity of this solution: with the specs as they are, somebody will
have to write code to make these two timers tunable, somebody will have
to test these new settings, somebody will have to map that into a Yang
model (or similar), and somebody will have to support that in an OSS
tool and use it to force consistent values on all PEs/A(S)BRs.
After getting a better understanding of the procedures, I agree they are
useful, under the condition that a reasonable default for each of the
two timers is standardized in the specs (so that they can be implemented
viably even before all the actions described above happen).
I would propose:
An implementation of these specs SHOULD offers means to configure
the values of timers 1 and 2. An implementation of these specs MUST
have a default value for timer 1 of at least [T1] seconds and a
value of timer 2 of at most [T2] seconds.
T1 and T2 are then left to be determined, with [T2] < [T1].
The target is to have T2 large enough to make it likely that the new UMH
has received and processed the route.
I would offer T2=60s and T1=120s.
Of course, setups that want a finer tuning to optimize bandwidth, will
typically to use the tuning knobs to change the timers.
Comments ?
-Thomas
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess