Re: [bess] WGLC, IPR and implementation poll for draft-ietf-bess-mvpn-fast-failover

Jeffrey (Zhaohui) Zhang Fri, 02 Aug 2019 10:52:15 -0700

Hi Greg,

Sorry for the late response.
Please see zzh2> in snipped text below.

From: Greg Mirsky <gregimir...@gmail.com>
Sent: Saturday, May 11, 2019 3:24 PM
To: Jeffrey (Zhaohui) Zhang <zzh...@juniper.net>
Cc: bess-cha...@ietf.org; EXT - thomas.mo...@orange.com 
<thomas.mo...@orange.com>; Robert Kebler <rkeb...@juniper.net>; BESS 
<bess@ietf.org>
Subject: Re: [bess] WGLC, IPR and implementation poll for 
draft-ietf-bess-mvpn-fast-failover

Hi Jeffrey,
thank you for your consideration and the detailed comments with great 
suggestions. Please find my answers below under GIM3>> tag. Attached is the 
diff to highlight the updates.

Regards,
Greg

On Tue, May 7, 2019 at 7:43 AM Jeffrey (Zhaohui) Zhang 
<zzh...@juniper.net<mailto:zzh...@juniper.net>> wrote:
Hi Greg,

Most of changes are fine; though I suggest to replace the following:

   For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is
   considered up if one or more of the P2MP RSVP-TE LSPs to the same PE,
   identified by the P-tunnel Attribute, are in Up state.

With the following:

   For P-tunnels of type P2MP MPLS-TE, the status of the P-tunnel is
   considered up if the sub-LSP to this downstream PE is in Up state.
GIM3>> Accept with one question. As this is the first sentence in the section, 
what is the PE we refer to as "this downstream PE"? Should we use "a downstream 
PE"?

Zzh2> Not sure what the right word is, but the point is that it is from a 
particular downstream PE’s point of view (I remember often seeing text like 
“this router” in RFCs).

Not all comments have been addressed, though. I trimmed some text below and 
highlighted the outstanding ones with “=============”. You may need to refer to 
my previous email for correlation/details.

Jeffrey

On Thu, Mar 14, 2019 at 3:04 AM Jeffrey (Zhaohui) Zhang 
<zzh...@juniper.net<mailto:zzh...@juniper.net>> wrote:
Thomas, Bob,

Some questions below for you. Some old, and some new.
 ==============================

Zzh> It’s not that the “rules … are not consistent”. It’s that by nature some 
PEs may think the tunnel is down while the others may think the tunnel is still 
up (because they’re on different tunnel branches), even when they follow the 
same rules. Traffic duplication in this case is also only with inclusive 
tunnels – so how about the following?

   Because all PEs may arrive at a different
   conclusion regarding the state of the tunnel,
   procedures described in Section 9.1.1 of [RFC 6513] MUST be used
   when using inclusive tunnels.
GIM3>> Got it, thx. Would s/may/could/ be acceptable to avoid questions about 
RFC2119-like language?

Zzh2> I think it should be a MUST – otherwise you get duplicates when different 
PEs pick different upstream PEs.

===============================
 Additionally, the text in section 3 seems to be more biased on Single 
Forwarder Election choosing the UMH with the highest IP address. Section 5 of 
RFC6513 also describes two other options, hashing or based on “installed UMH 
route” (aka unicast-based). It is not clear how the text in this document 
applies to hashing based selection, and I don’t see how the text applies to 
unicast-based selection. Some rewording/clarification are needed here.
GIM>> How would the use of an alternative UMH selection algorithm change 
documented use of p2mp BFD? Do you think that if the Upstream PE selected 
using, for example, hashing then defined use of BGP-BFD and p2mp BFD itself no 
longer applicable?

Zzh> It’s not that the alternative UMH selection algorithm change documented 
use of p2mp BFD. It’s the other way around – tunnel state changes the selection 
result. I guess hashing can still be used (this document only controls what 
goes into the candidate pool). For unicast based selection I thought it’d no 
longer work, but then I noticed the following:

   o  second, the UMH candidates that advertise a PMSI bound to a tunnel
      that is "down" -- these will thus be used as a last resort to
      ensure a graceful fallback to the basic MVPN UMH selection
      procedures in the hypothetical case where a false negative would
      occur when determining the status of all tunnels

Zzh> So this should still work, although Ideally, the PE advertising the next 
best route should be considered before going to the last resort (of using the 
PE advertising the best route but whose tunnel is down).
GIM3>> I hope I've got the idea. Below is the updated text (second becomes 
third and your proposal - second):
NEW TEXT:
   o  Second, the PE advertising the next best route is to be
      considered.

   o  Third, the UMH candidates that advertise a PMSI bound to a tunnel
      that is "down" -- these will thus be used as a last resort to
      ensure a graceful fallback to the basic MVPN UMH selection
      procedures in the hypothetical case where a false negative would
      occur when determining the status of all tunnels.

Zzh2> I checked the surrounding text in this draft and section 5.1.3 in 
RFC6513. I believe section 3 of this document, before its subsection 3.1 should 
be re-written as following:

3.  UMH Selection based on tunnel status

   Current multicast VPN specifications [RFC6513], section 5.1, describe
   the procedures used by a multicast VPN downstream PE to determine
   what the upstream multicast hop (UMH) is for a given (C-S, C-G).

   The procedure described here is an OPTIONAL procedure that consists
   of having a downstream PE take into account the status of P-tunnel
   rooted at each possible upstream PEs. Because the PEs could arrive at
   a different conclusion regarding the state of the tunnel, procedures
   described in Section 9.1.1 of [RFC6513] MUST be used when using
   inclusive tunnels.

   For a given downstream PE and a given VRF, the P-tunnel corresponding
   to a given upstream PE for a given (C-S, C-G) state is the S-PMSI
   tunnel advertised by that upstream PE for this (C-S, C-G) and
   imported into that VRF, or if there isn't any such S-PMSI, the I-PMSI
   tunnel advertised by that PE and imported into that VRF.

   There are three options specified in Section 5.1 of [RFC6513] for a
   downstream PE to select an Upstream PE.

   The first two options select the Upstream PE from a candidate PE set
   either based on IP address or a hashing algorithm. When used together
   with the optional procedure of considering the P-tunnel status as in
   this document, a candidate upstream PE is included in the set if it either

      A.  advertise a PMSI bound to a tunnel, where the specified tunnel
          is not known to be down or

      B.  do not advertise any x-PMSI applicable to the given (C-S, C-G)
          but have associated a VRF Route Import BGP attribute to the
          unicast VPN route for S (this is necessary to avoid
          incorrectly invalidating a UMH PE that would use a policy
          where no I-PMSI is advertised for a given VRF and where only
          S-PMSI are used, the S-PMSI advertisement being possibly done
          only after the upstream PE receives a C-multicast route for
          (C-S, C-G)/(C-*, C-G) to be carried over the advertised
          S-PMSI).

   If the resulting candidate set is empty, then the procedure is repeated
   without considering the P-tunnel status.

   The third option simply uses the installed UMH Route (i.e., the "best"
   route towards the C-root) as the Selected UMH Route, and its originating
   PE is the selected Upstream PE. With the optional procedure of
   considering P-tunnel status as in this document, the Selected UMH Route
   is the best one among those whose originating PE's P-tunnel is not "down".
   If that does not exist, the installed UMH Route is selected regardless
   of the P-tunnel status.

Zzh2> The reason is that for the candidate set is not ordered – it’s just a set 
to select from (either based on IP address or hashing).

========================================

zzh> BTW, the same applies to 3.1.7 as well.
GIM>> Agree

==================================

3.1.7.  Per PE-CE link BFD Discriminator

   The following approach is defined for the fast failover in response
   to the detection of PE-CE link failures, in which UMH selection for a
   given C-multicast route takes into account the state of the BFD
   session associated with the state of the upstream PE-CE link.

3.1.7.1.  Upstream PE Procedures

   For each protected PE-CE link, the upstream PE initiates a multipoint
   BFD session [I-D.ietf-bfd-multipoint] as MultipointHead toward
   downstream PEs.  A downstream PE monitors the state of the p2mp
   session as MultipointTail and MAY interpret transition of the BFD
   session into Down state as the indication of the associated PE-CE
   link being down.

Since the BFD packets are sent over the P2MP tunnel not the PE-CE link, my 
understanding is that the BFD discriminator is still for the tunnel and not 
tied to the PE-CE link; but different from the previous case, the root will 
stop sending BFD messages when it detects the PE-CE link failure. As far as the 
egress PEs are concerned, they don’t know if it is the tunnel failure or PE-CE 
link failure.

If my understanding is correct, the wording should be changed.
GIM>> There are other than stopping transmission of BFD control packets ways to 
distinguish two conditions for the egress PE. For example, the MultipointHead 
MAY set the State to AdminDown and continue sending BFD control packets. If and 
when PE-CE link restored to Up, the MultipointHead can set the state to Up in 
the BFD control packet.
===================== this needs more discussion =====
===== should be clear on which way is done – stop sending BFD message or use 
AdminDown
===== an PMSI may be used for many flows, which may use different PE-CE 
interfaces on the ingress PE. A downstream PE would not know which interface it 
should track for a particular flow.
GIM3>> Thank you for helping me to understand the problem with PE-CE and p2mp 
BFD. I've updated the paragraph is 3.1.7, I've found the better method to 
indicate the PE-CE link failure to the downstream. Also, stress that though it 
is likely that PE-CE association be 1:1, it is outside the scope of the draft. 
Please let me know if the new text addresses your questions:
NEW TEXT:
   The following approach is defined for the fast failover in response
   to the detection of PE-CE link failures, in which UMH selection for a
   given C-multicast route takes into account the state of the BFD
   session associated with the state of the upstream PE-CE link.
   According to section 6.8.17 [RFC5880], failure of a PE-CE link MAY be
   communicated to the downstream PE by setting the bfd.LocalDiag of the
   p2mp BFD session associated with this link to Concatenated Path Down
   and/or Reverse Concatenated Path Down.  The mechanism to communicate
   the mapping between the PE-CE link and the associated BFD session is
   outside the scope of this document.

Zzh2> Because you still want to track the tunnel state (in addition to pe-ce 
interface state), you would need at least two discriminators – one for the 
tunnel and one for the PE-CE link. However, the new “BGP- BFD attribute” 
defined in this spec only accommodates one discriminator (and my understanding 
is that you can’t have more than one of the same attribute).
Zzh2> The simplest solution is that just use the same discriminator (vs. per 
PE-CE link discriminator). With that, the ENTIRE section 3.1.7 (including its 
subsections) become the following:

3.1.7 Tracking upstream PE-CE link status

   In case the PE-CE link on an upstream PE failed, even though the provider 
tunnel is still up,
   It is desired for the downstream PEs to switch to a backup upstream PE. To 
achieve that,
   If the upstream PE detects that its PE-CE link fails, it SHOULD set the 
bfd.LocalDiag of the
   p2mp BFD session to Concatenated Path Down and/or Reverse Concatenated Path 
Down,
   unless it switches to a new PE-CE link immediately (in that case the 
upstream PE will start tracking
   the status of the new PE-CE link).
   When a downstream PE receives that bfd.LocalDiag code, it treats as if the 
tunnel itself
   failed and tries to switch to a backup PE.

   …  If the route to the
   src/RP changes such that the RPF interface is changed to be a new PE-
   CE interface, then the upstream PE will update the S-PMSI A-D route
   with included BGP-BFD Attribute so that value of the BFD
   Discriminator is associated with the new RPF link.

If the RPF interface changes on the upstream PE, why should it update the route 
to send a new discriminator? As long as there is a new RPF interface couldn’t 
the upstream PE do nothing but start tracking the new RPF interface?
GIM>> I'll defer this one to Thomas and Rob.
===========================================
Zzh> I re-read section 3.1.6 and 3.1.7 and have more questions 😊
Zzh> 3.1.6 seems to be about tracking tunnel itself while 3.1.7 is about 
tracking PE-CE interfaces. From an egress point of view, (how) does it know if 
the discriminator is for the tunnel or for PE-CE interface 1 or PE-CE interface 
2? Does it even care? It seems to me that an egress PE would not need to care. 
If so, why are there different procedures for 3.1.6/3.1.7 (at least for the 
egress PE behavior)? Even for the upstream PE behavior, shouldn’t 3.1.6.1 apply 
to 3.1.7 as well?
GIM>> Added the following text to the first paragraph of section 3.1.7:
NEW TEXT:
The mechanism to communicate the mapping between the PE-CE link
and the associated BFD session is outside the scope of this document.

=============== the above added text does not address my questions
Zzh2> The above was still outstanding, but with my proposed new 3.1.7 section 
it is fine.

Regardless which way (the currently described way and my imagined way), some 
text should be added to discuss how the downstream would not switch to another 
upstream PE when the primary PE is just going through a RPF change.
GIM>>  Would appending the following text be acceptable to address your concern:
NEW TEXT:
   To avoid unwarranted switchover a downstream PE MUST gracefully handle the
   updated S-PMSI A-D route and switch to the use of the associated BFD
   Discriminator value.
================= how that is done needs to be discussed
GIM3>> I think that this is implementation issue and we just point to the 
recommended behavior without prescribing what steps must be taken to achieve it.
Zzh2> What is specified is not enough for even vendor-dependent implementation. 
On the other hand, it is no longer a problem with my newly proposed section 
3.1.7.
Zzh2> One more zzh2> comment below.

4.  Standby C-multicast route

   The procedures described below are limited to the case where the site

   that contains C-S is connected to exactly two PEs. The procedures
   require all the PEs of that MVPN to follow the single forwarder PE
   selection, as specified in [RFC6513].

Why would it not work with more than two upstream PEs?
Why is it limited to single forwarder selection? What about unicast based 
selection?
GIM>> Again, asking for Thomas and Rob to help.
Zzh2> I suggest we remove this paragraph. OK I see it’s already removed.
Zzh2> Jeffrey

Juniper Internal

Juniper Business Use Only

_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] WGLC, IPR and implementation poll for draft-ietf-bess-mvpn-fast-failover

Reply via email to