Authors
As is customary, please find below my document shepherd review of this draft.
The comments are mainly of an editorial nature or suggest improvements to aid
readability.
Please treat these comments (prepended with MB>) as you would any other working
group last call comments.
Best regards
Matthew
===
Fast Recovery for EVPN Designated Forwarder Election
draft-ietf-bess-evpn-fast-df-recovery-05
Abstract
Ethernet Virtual Private Network (EVPN) solution provides Designated
MB> /Ethernet/The Ethernet
Forwarder election procedures for multihomed Ethernet Segments.
These procedures have been enhanced further by applying Highest
Random Weight (HRW) Algorithm for Designated Forwarded election in
order to avoid unnecessary DF status changes upon a failure. This
draft improves these procedures by providing a fast Designated
MB> /draft/document
Forwarder (DF) election upon recovery of the failed link or node
associated with the multihomed Ethernet Segment. The solution is
independent of number of EVIs associated with that Ethernet Segment
MB> /of number/of the number
and it is performed via a simple signaling between the recovered PE
and each of the other PEs in the multihoming group.
[...]
1. Introduction
Ethernet Virtual Private Network (EVPN) solution [RFC7432] is
MB> /Ethernet/The Ethernet
becoming pervasive in data center (DC) applications for Network
Virtualization Overlay (NVO) and DC interconnect (DCI) services, and
in service provider (SP) applications for next generation virtual
private LAN services.
[...]
The EVPN specification [RFC7432] describes DF election procedures for
MB> I think you just need to say [RFC7432] describes...
multihomed Ethernet Segments. These procedures are enhanced further
in [RFC8584] by applying Highest Random Weight Algorithm for DF
election in order to avoid DF status change unnecessarily upon a link
or node failure associated with the multihomed Ethernet Segment.
MB> I found the above hard to parse. Maybe replace it with:
"These procedures are enhanced further
in [RFC8584] by applying Highest Random Weight Algorithm for DF
election in order to avoid unnecessary DF status changes upon a link
or node failure associated with the multihomed Ethernet Segment."
[...]
1.1. Terminology
Provider Edge (PE): A device that sits in the boundary of Provider
and Customer networks and performs encap/decap of data from L2 to
L3 and vice-versa.
MB> Not sure you need to define PE as it is a well known term, but in any case
I think
Your definition differs from ones I could find I previous RFCs. Maybe you can
just delete it.
Designated Forwarder (DF): A PE that is currently forwarding
(encapsulating/decapsulating) traffic for a given VLAN in and out
of a site.
2. Challenges with Existing Solution
In EVPN technology, multiple PE devices have the ability to encap and
decap data belonging to the same VLAN. In certain situations, this
may cause L2 duplicates and even loops if there is a momentary
overlap of forwarding roles between two or more PE devices, leading
to broadcast storms.
EVPN [RFC7432] currently uses timer based synchronization among PE
devices in redundancy group that can result in duplications (and even
loops) because of multiple DFs if the timer is too short or
blackholing if the timer is too long.
Using split-horizon filtering (Section 8.3 of [RFC7432]) can prevent
loops (but not duplicates), however if there are overlapping DFs in
MB> I suggest you split the sentence to make it more readable:
"...(but not duplicates). However, if there are..."
two different sites at the same time for the same VLAN, the site
identifier will be different upon re-entry of the packet and hence
the split-horizon check will fail, leading to L2 loops.
[...]
However, upon PE insertion or port bring-up (recovery event), HRW
MB> Do you mean "...or port bring-up following a recovery event,"?
also cannot help as a transfer of DF role to the newly inserted
device/port must occur while the old DF is still active.
+---------+
+-------------+ | |
| | | |
/ | PE1 |----| | +-------------+
/ | | | MPLS/ | | |---CE3
/ +-------------+ | VxLAN/ | | PE3 |
CE1 - | Cloud | | |
\ +-------------+ | |---| |
\ | | | | +-------------+
\ | PE2 |----| |
| | | |
+-------------+ | |
+---------+
Figure 1: CE1 multihomed to PE1 and PE2.
In the Figure 1, when PE2 is inserted or booted up, PE1 will transfer
MB> /transfer/transfer the
DF role of some VLANs to PE2 to achieve load balancing. However,
because there is no handshake mechanism between PE1 and PE2,
duplication of DF roles for a given VLAN is possible. Duplication of
DF roles may eventually lead to duplication of traffic as well as L2
loops.
Current EVPN specification [RFC7432] and [RFC8584] relies on a timer-
MB> /specification/specifications
MB> /relies/rely
based approach for transferring the DF role to the newly inserted
device. This can cause the following issues:
* Loops/Duplicates if the timer value is too short
* Prolonged Traffic Blackholing if the timer value is too long
3. DF Election Synchronization Solution
The solution relies on the concept of common clock alignment between
partner PEs participating to a common Ethernet Segment. The main
idea is to have all peering PEs of that Ethernet Segment perform DF
election, and apply their resulting carving state, at a same well-
known time.
MB> It would be clearer if you could identify the partner YEs on a figure e.g.
Figure 1
The DF Election procedure, as described in [RFC7432] and as
optionally signalled in [RFC8584], is applied. All PEs attached to a
given Ethernet Segment are clock-synchronized; using a networking
MB> /clock-synchronized;/clock-synchronized
protocol for clock synchronization (e.g. NTP, PTP, etc.). Newly
inserted device PE or during failure recovery of a PE, that PE
communicates the current time to peering partners plus the remaining
peering timer time left.
MB> The first part of the above does not parse. Do you mean "When a new PE is
inserted
or an existing PE device recovers,..."?
This constitutes an "end time" or "absolute
time" as seen from local PE. That absolute time is called "Service
Carving Time" (SCT).
A new BGP Extended Community is advertised along with Ethernet
MB> Maybe say it is the "Service Carving Timestamp" here.
Segment route (RT-4) to communicate to other partners the Service
Carving Time.
Upon reception of that new BGP Extended Community, partner PEs know
MB> /know/can determine
exactly its carving time. The notion of skew is introduced to
eliminate any potential duplicate traffic or loops. They add a skew
MB> Who is "they". Do you mean "The receiving partner PEs"?
(default = -10ms) to the Service Carving Time to enforce this. The
previously inserted PE(s) must carve first, followed shortly(skew) by
the newly insterted PE.
To summarize, all peering PEs carve almost simultaneously at the time
announced by newly added/recovered PE. The newly inserted PE
initiates the SCT, and carves immediately on peering timer expiry.
The previously inserted PE(s) receiving Ethernet Segment route (RT-4)
with a SCT BGP extended community, carve shortly before Service
Carving Time.
3.1. Advantages
MB> This section seems out of place in a protocol spec. I suggest moving this
text to the end of the
introduction.
There are multiples advantages of using the approach. Here is a non-
exhaustive list:
* A simple uni-directional signaling is all that is needed
* Backwards-compatible: PEs supporting only older [RFC7432] shall
simply discard unrecognized new "Service Carving Timestamp" BGP
Extended Community
* Multiple DF Election algorithms can be supported:
- [RFC7432] default ordered list ordinal algorithm (Modulo),
- [RFC8584] highest-random weight, etc.
* Independent of BGP transmission delay regarding Ethernet Segment
route (RT-4)
* Agnostic of the time synchronization mechanism used (e.g. NTP,
PTP, etc.)
[…]
3.2. BGP Encoding
[…]
This capability is used in conjunction with the agreed upon DF Type
(DF Election Type). For example if all the PEs in the Ethernet
Segment indicated that they have Time Synchronization capability and
they want the DF type to be HRW, then HRW algorithm is used in
S/then HRW/then the HRW
conjunction with this capability.
_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess