Hi Benjamin, Satya and I made the changes agreed on this thread and we published version 09. Version 09 is good to go from our perspective.
Benjamin, Martin, please let us know if the draft can progress now. Thank you. Jorge -----Original Message----- From: Benjamin Kaduk <ka...@mit.edu> Date: Saturday, January 19, 2019 at 2:13 AM To: "Satya Mohanty (satyamoh)" <satya...@cisco.com> Cc: "Rabadan, Jorge (Nokia - US/Mountain View)" <jorge.raba...@nokia.com>, The IESG <i...@ietf.org>, "draft-ietf-bess-evpn-df-election-framew...@ietf.org" <draft-ietf-bess-evpn-df-election-framew...@ietf.org>, Stephane Litkowski <stephane.litkow...@orange.com>, "bess-cha...@ietf.org" <bess-cha...@ietf.org>, "firstname.lastname@example.org" <email@example.com> Subject: Re: Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-df-election-framework-07: (with DISCUSS and COMMENT) On Sat, Jan 19, 2019 at 12:38:02AM +0000, Satya Mohanty (satyamoh) wrote: > Hi Benjamin, > > > > Thanks for your comments. My replies inline [Satya] > > > > On 1/18/19, 8:01 AM, "Benjamin Kaduk" <ka...@mit.edu> wrote: > > > > On Thu, Jan 17, 2019 at 10:10:16PM +0000, Rabadan, Jorge (Nokia - US/Mountain View) wrote: > > > Benjamin, > > > > > > Thank you very much for your review. > > > Satya and I have looked at your points one by one, please see in-line. Let us know if you are ok now. We'll publish revision 08 soon. > > > > I see that the -08 has landed since I started composing this message -- > > thanks! Some more comments inline. > > > > > Thanks. > > > Jorge > > > > > > -----Original Message----- > > > From: Benjamin Kaduk <ka...@mit.edu> > > > Date: Tuesday, January 8, 2019 at 6:52 PM > > > To: The IESG <i...@ietf.org> > > > Cc: "draft-ietf-bess-evpn-df-election-framew...@ietf.org" <draft-ietf-bess-evpn-df-election-framew...@ietf.org>, Stephane Litkowski <stephane.litkow...@orange.com>, "bess-cha...@ietf.org" <bess-cha...@ietf.org>, "stephane.litkow...@orange.com" <stephane.litkow...@orange.com>, "firstname.lastname@example.org" <email@example.com> > > > Subject: Benjamin Kaduk's Discuss on draft-ietf-bess-evpn-df-election-framework-07: (with DISCUSS and COMMENT) > > > Resent-From: <alias-boun...@ietf.org> > > > Resent-To: <jorge.raba...@nokia.com>, <satya...@cisco.com>, <saja...@cisco.com>, <jdr...@juniper.net>, <kiran.naga...@nokia.com>, <senthil.sathap...@nokia.com> > > > Resent-Date: Tuesday, January 8, 2019 at 6:51 PM > > > > > > Benjamin Kaduk has entered the following ballot position for > > > draft-ietf-bess-evpn-df-election-framework-07: Discuss > > > > > > When responding, please keep the subject line intact and reply to all > > > email addresses included in the To and CC lines. (Feel free to cut this > > > introductory paragraph, however.) > > > > > > > > > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html > > > for more information about IESG DISCUSS and COMMENT positions. > > > > > > > > > The document, along with other ballot positions, can be found here: > > > https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-df-election-framework/ > > > > > > > > > > > > ---------------------------------------------------------------------- > > > DISCUSS: > > > ---------------------------------------------------------------------- > > > > > > > > > It's not really clear to me that the question of Updating 7432 has been > > > settled by the responses to the directorate reviews; I've noted a few > > > places in the text that are problematic in this regard, in the COMMENT > > > section. > > > > > > [JORGE] We finally agreed on making it updating 7432 and explain why in the intro/abstract. Thanks. > > > > > > > > > [concerns about combinatoric explosion were overblown; removed] > > > > > > Section 3.3 > > > > > > Section 7.6 of [RFC7432] describes how the value of the ES-Import > > > Route Target for ESI types 1, 2, and 3 can be auto-derived by using > > > the high-order six bytes of the nine byte ESI value. The same auto- > > > derivation procedure can be extended to ESI types 0, 4, and 5 as long > > > as it is ensured that the auto-derived values for ES-Import RT among > > > different ES types don't overlap. > > > > > > How do I ensure that the auto-derived values don't overlap? > > > > > > [JORGE] The autoderivation in RFC7432 for ESI types 1, 2 and 3 can be used "only if it produces ESIs that satisfy the uniqueness requirement specified above." RFC7432 does not specify how the overlap is avoided, it is out of scope, but one may think the operator must manually check that the values auto-deriving the 9-octet ESI value don't match. We added the following text, let us know if it is okay: > > > "As in [RFC7432], the mechanism to guarantee that the auto-derived ESI or ES-import RT values for different ESIs do not match is out of scope of this document." > > > > That suffices to resolve the Discuss point. > > Having said that, though, Martin did some research and had some good points > > on the telechat, that type 0 has 9 octets of fully arbitrary value, whereas > > types 4 and 5 use as a base the router ID and also have a local > > discriminator. (IIRC, types 1, 2, and 3 were generally going to be unique > > as inherent properties of how they are determined, e.g., via the global > > uniqueness of MAC addresses.) I would suggest (but not insist upon) > > mentioning something about the operator being able to use the discriminator > > and type-0 arbitrary values to ensure the needed non-overlap. > > > > > > > Section 4.2 > > > > > > The ESI value MAY be set to all 0's in the Weight > > > function below if the operator so chooses. > > > > > > I'm not 100% sure I'm interpreting this correctly, but this sounds like a > > > piece of device-specific configuration (i.e., configured by the operator) > > > that must be the same across all devices for correct operation, but is not > > > covered by the advertisement of the DF Election Exctended Community. This > > > would decrease the robustness of the system to basically the "experimental" > > > level of DF election algorithm 31, which also relies on universal agreement > > > of manual configuration. Is this actually something we want to include? > > > > > > [Satya] This is to accommodate the case in https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00. > > > Specifically, if the same set of PES are multi-homed to the same set of ESes, setting the ES to 0, would result in the "same unique" PE be the DF for a given EVI for all those ESes. > > > Use can be made of this property in the optimization in https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00. > > > Yes, it would need manual configuration to enable this. > > > > I think I did not properly convey the point I was concerned about, here. > > What I'm concerned about is that if there are separate knobs for using HRW > > as the DF alg, and for setting the ESI input to zero, then we have a new > > failure mode in the case when all routers agree to use HRW but disagree > > about whether to set ESI to zero. In particular, it seems like that could > > cause two routers to both think they are the active DF, which is IIUC > > really bad, especially so since it defeats the mechanism introduced by this > > document to ensure that all PEs agree what algorithm is being used (with > > fallback to the default algorithm). It would seem much safer to just have > > two DF algorithms -- one for HRW-including-ESI and another for > > HRW-with-ESI-set-to-zero (or HRW-without-ESI, really), and allow > > deployments to have the full protection of the election algorithm being > > defined here. > > > > [Satya] Agreed that you have a point. We can indeed have two algorithms, one for HRW-including-ESI and another for HRW-without-ESI. > > I can incorporate that in https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00 when we refresh it before this IETF (we are delayed on that). > > Will that suffice to address your concern? So the idea would be to remove the discussion from this document about setting ESI to zero, and have draft-mohanty-bess-evpn-bum-opt allocate a new DF Algorithm codepoint? That would be what I see as the optimal solution. > > > As an additional asid, I'm not 100% sure I see the applicability of the > > zero-ESI case ot the draft-mohanty-bess-evpn-bum-opt case -- from reading > > the document itself I was wondering if the zero-ESI input was needed so > > that (using that document's Figure 1) PE5 would be able to tell which of > > PE1/2/3 wer the DF for the ES attaching CE1. But my reading of your text > > above is that the goal would be to try to get all BUM traffic to that one > > "same unique" PE that is DF for *all* ESes, letting the other PEs avoid > > getting any BUM traffic. In any case, this is just a side note, since I > > don't think it matters for the discussion in my previous paragraph here. > > [Satya] Your understanding is on the correct lines. That is the intention of the zero-ESI. > > In that case, the DF will be same for all the ES. > > But in that draft, since we also advertise the IMET from the BDF and other PEs that may be singly-homed, PE5 will not necessarily know who is the DF. > > > > > Section 5 > > > > > > The AC-DF capability MAY be used with any "DF Alg" algorithm. It MUST > > > > > > As written, this suggests that it is true for any current or future > > > algorithm, which is in conflict with the text in Section 3.2 that notes > > > that "for any new DF Alg defined in future, its applicability/compatibility > > > to the existing capabilities must be assessed on a case by case basis." It > > > seems more prudent to make the assessment after the relevant technologies > > > are both extant, so I would suggest this be non-normative text, perhaps > > > "the AC-DF capability is expected to be of general applicability with any > > > future 'DF Alg' algorithm". > > > > > > [JORGE] Good point. We added "The AC-DF capability is expected to be of general applicability with any future DF Algorithm." > > > > > > > > > > > > ---------------------------------------------------------------------- > > > COMMENT: > > > ---------------------------------------------------------------------- > > > > > > Section 1.2.1 > > > > > > I a little bit wonder if the risk of poor distribution of DFs with the > > > default algorithm is being oversold -- any "hash identifiers into buckets" > > > scheme will be susceptible to pessimal input, but if the inputs are not > > > attacker-controlled and the pessimal inputs are unlikely to occur randomly, > > > we may not need to care. > > > > > > 2- Even in the case when the Ethernet Tag distribution is uniform the > > > instance of a PE being up or down results in re-computation ((v > > > mod N-1) or (v mod N+1) as is the case); the resulting modulus > > > value need not be uniformly distributed because it can be subject > > > to the primality of N-1 or N+1 as may be the case. > > > > > > This is making some assumptions about the (potential) distribution of the > > > tag values that could be made more clear, as otherwise the primality is > > > not particularly relevant (particularly for an actual uniform distribution > > > that covers all possible values). Similarly below, by the CLRS reference > > > (CLRS probably has the ability to assume that we're running on binary > > > computers and may even be doing things like operating on pointers, which > > > tend to have fixed structure in the low-order bits due to alignment > > > considerations, etc. For these (human-allocated?) integer identifiers it's > > > less clear what assumptions should come into play.) > > > > > > [Satya] As I replied to Adam: > > > > Adam had a better description of how to improve this text, and I do agree > > that the new text is much more clear -- thank you both to Adam and the > > authors for working on this! > > [Satya] Thank you for emphasizing on this and bringing it to our attention. > > > > > The Ethernet tag that identifies the BD can be as large as 2^24; however, it is not guaranteed that the tenant BD on the ES will conform to a uniform distribution. In fact, it up to the customer what BDs they will configure on the ES. Quoting Knuth [Art of Computer Programming Pg. 516] > > > " In general, we want to avoid values of M that divide r^k+a or r^k−a, where k > > > and a are small numbers and r is the radix of the alphabetic character set > > > (usually r=64, 256 or 100), since a remainder modulo such a value of M tends > > > to be largely a simple superposition of key digits. Such considerations > > > suggest that we choose M to be a prime number such that r^k!=a(modulo)M or > > > r^k!=−a(modulo)M for small k & a." > > > In our case, N is the number of PEs in RFC 7432 which corresponds to M above. > > > Since N, N-1 or N+1 need not satisfy the primality properties of the M above; as per RFC 7432 modulo based DF assignment, whenever a PE goes down or a new PE boots up (hosting the same Ethernet Segment), the modulo scheme need not necessarily map BDs to PEs uniformly. > > > > > > Section 1.3 > > > > > > Section 2.2 describes some of the issues that exist in the Default DF > > > > > > There is no section 2.2; presumably this is supposed to be 1.2. > > > [JORGE] changed, thx. > > > > > > o HRW and AC-DF mechanisms are independent of each other. Therefore, > > > a PE MAY support either HRW or AC-DF independently or MAY support > > > both of them together. A PE MAY also support AC-DF capability along > > > with the Default DF election algorithm per [RFC7432]. > > > > > > This seems a little confusing since just a couple paragraphs ago you are > > > distinguishing between "election algorithms" and "capabilities", but here > > > the two new things (one of each type) are lumped together as "mechanisms". > > > If election algorithms and capabilities are inherently independent things, > > > then maybe there is not a need to reiterate the independence of HRW and > > > AC-DF here. > > > [JORGE] they are indeed independent, but since in the future may be some capabilities that only make sense for certain DF Algs, we believe it is better to explicitly state here that the DF Alg and capability defined are compatible. Let us know if it is not okay. > > > > It is fine to leave as-is. > > > > > Section 3 > > > > > > This section describes the BGP extensions required to support the new > > > DF Election procedures. In addition, since the EVPN specification > > > [RFC7432] does leave several questions open as to the precise final > > > state machine behavior of the DF election, section 3.1 describes > > > precisely the intended behavior. > > > > > > This text sounds like we should be Update:ing 7432. > > > [JORGE] yes, we finally converged into this too. Rev 08 will update 7432. Thanks. > > > > > > Section 3.2 > > > > > > - Otherwise if even a single advertisement for the type-4 route is > > > not received with the locally configured DF Alg and capability, > > > > > > nit: shouldn't this be "received without"? > > > [JORGE] fixed, thanks. > > > > > > Section 3.2.1 > > > > > > [RFC7432] implementations (i.e., those that predate this > > > specification) will not advertise the DF Election Extended Community. > > > > > > This wording also suggests that we should be Update:ing 7432. > > > [JORGE] done. Thx. > > > > > > Section 4 > > > I note that the state of the art in non-cryptographic fast hashing has > > > improved a lot since 1998 and we have things like the Jenkins hash that are > > > supposed to be superior to CRC-32 and such. > > > > > > [HRW1999] provides pseudo-random > > > functions based on the Unix utilities rand and srand and easily > > > constructed XOR functions that perform considerably well. This > > > imparts very good properties in the load balancing context. Also each > > > server independently and unambiguously arrives at the primary server > > > selection. [...] > > > > > > It's not really clear to me that this text adds much value -- we go on > > > later to say that we explicitly use a Wrand() function from HRW1999. > > > > > > [Satya] Agreed. We can take it out for brevity. We can edit this a bit as I don’t see it harming anything. > > > > > > Section 4.2 > > > > > > 1. DF(v) = Si: Weight(v, Es, Si) >= Weight(v, Es, Sj), for all j. In > > > case of a tie, choose the PE whose IP address is numerically the > > > least. Note 0 <= i,j < Number of PEs in the redundancy group. > > > > > > I strongly suggest expanding out the notation with more words, e.g. "DF(v) > > > is defined to be the address Si such that [...]". We probably shouldn't > > > assume much abstract math background from RFC readership. (Similarly for > > > BDF(v). The BDF(v) expression doesn't even say what the i, j, and k are > > > evaluated over.) > > > Denote the PEs addresses as S0, S1, .. SN-1. > > > [Satya] DF(v): is defined to be the address Si (index i) for which weight(v, Es, Si) is the highest, 0 <= i < N-1 > > > > The new text is a big help, here. I still expect readers to be confused by > > the usage (in the retained enumerated list) of "DF(v) = Si:" that uses ":" > > as a shorthand for "such that" -- in my mathematics course, we used the > > vertical bar as such a shorthand, but it still remains something of a > > specialist notation that I don't expect to be familiar to the general > > reader base. > > [Satya] ok 😊 I can change with “|”. Okay, thanks. -Benjamin > > > > Similarly, BDF(v) is defined as that PE with address Sk for which the computed weight is the next highest after the weight of the DF. > > > j is the running index from 0 to N-1, i, k are selected values. > > > > > > HRW solves the disadvantages pointed out in Section 2.2.1 and > > > ensures: > > > > > > Again, this is now Section 1.2.1 > > > [JORGE] fixed, thanks. > > > > > > o More importantly it avoids the needless disruption case of Section > > > 2.2.1 (3), that is inherent in the existing Default DF Election. > > > > > > and here. > > > (Also, this bullet point is just describing the same situation as the > > > previous one, if I understand correctly.) > > > [JORGE] fixed, thanks. > > > > > > Section 5 > > > > > > modify the DF Election procedures by removing from consideration any > > > candidate PE in the ES that cannot forward traffic on the AC that > > > belongs to the BD. [...] > > > > > > What guarantees that the ACS information is available on all PEs involved > > > in the election? > > > [JORGE] the ACS information is available on all PEs since it is distributed by the A-D routes as explained later. The withdrawal of an A-D per-EVI route indicates the AC state goes down. But since this is the behavior in RFC7432, we don't think there is a need to explain that. > > > > Ah, that makes sense. Thank you for explaining it to me, with my inexpert > > 7432 knowledge. > > > > > In particular, when used with the Default DF Alg, the AC-DF > > > capability modifies the Step 3 in the DF Election procedure described > > > in [RFC7432] Section 8.5, as follows: > > > > > > Only a single paragraph follows, but the referenced document has three > > > paragraphs in the indicated step. Are the last two paragraphs no longer > > > intended to apply? In particular, if we apply this paragraph as a direct > > > replacement for the RFC 7432 step 3, then there is no longer a normative > > > description of the modulus-based algorithm, which seems incorrect. Also, > > > there's a lot of style/editorial changes, that make the difference in > > > behavior harder to read from the diff. (Side note: I don't think this > > > particular text implies that this document needs an Updates: relation to > > > RFC 7432, since it is a behavior change conditional on the use of a > > > negotiated feature.) > > > [JORGE] we changed the text to the following. Hopefully it makes it clear: > > > ---------------------- > > > In particular, when used with the Default DF Alg, the AC-DF > > > capability modifies the Step 3 in the DF Election procedure described > > > in [RFC7432] Section 8.5, as follows: > > > > > > 3. When the timer expires, each PE builds an ordered "candidate" list > > > of the IP addresses of all the PE nodes attached to the Ethernet > > > Segment (including itself), in increasing numeric value. The > > > candidate list is based on the Originator Router's IP addresses of > > > the ES routes, but excludes any PE from whom no Ethernet A-D per > > > ES route has been received, or from whom the route has been > > > withdrawn. Afterwards, the DF Election algorithm is applied on a > > > per <ES, Ethernet Tag>, however, the IP address for a PE will not > > > be considered candidate for a given <ES, Ethernet Tag> until the > > > corresponding Ethernet A-D per EVI route has been received from > > > that PE. In other words, the ACS on the ES for a given PE must be > > > UP so that the PE is considered as candidate for a given BD. If > > > the Default DF Alg is used, every PE in the resulting candidate > > > list is then given an ordinal indicating its position in the > > > ordered list, starting with 0 as the ordinal for the PE with the > > > numerically lowest IP address. The ordinals are used to determine > > > which PE node will be the DF for a given Ethernet Tag on the > > > Ethernet Segment, using the following rule: > > > > > > Assuming a redundancy group of N PE nodes, for VLAN-based service, > > > the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when > > > (V mod N)= i. In the case of VLAN-(aware) bundle service, then the > > > numerically lowest VLAN value in that bundle on that ES MUST be > > > used in the modulo function as Ethernet Tag. > > > > > > It should be noted that using the "Originating Router's IP > > > address" field in the Ethernet Segment route to get the PE IP > > > address needed for the ordered list allows for a CE to be > > > multihomed across different ASes if such a need ever arises. > > > > That is a big help, thanks. I might consider starting a new paragraph for > > "If the Default DF Alg is used[...]", but that's a very minor point. > > > > > <snip> > > > ---------------------- > > > > > > a) When PE1 and PE2 discover ES12, they advertise an ES route for > > > ES12 with the associated ES-import extended community and the DF > > > Election Extended Community indicating AC-DF=1; they start a timer > > > at the same time. [...] > > > > > > (nit?) This text implies some synchronization between PE1 and PE2 for > > > starting the timer, whereas I think the intent is just to note that they > > > each start a timer as they advertise the route, independently of each other. > > > [JORGE] good catch. Changed to: > > > "a) When PE1 and PE2 discover ES12, they advertise an ES route for ES12 with the associated ES-import extended community and the DF Election Extended Community indicating AC-DF=1; they start a DF Wait timer (independently). Likewise, PE2 and PE3 advertise an ES route for ES23 with AC-DF=1 and start a DF Wait timer." > > > > > > > > > In addition to the events defined in the FSM in Section 3.1, the > > > following events SHALL modify the candidate PE list and trigger the > > > DF re-election in a PE for a given <ES,VLAN> or <ES,VLAN Bundle>. In > > > the FSM of Figure 3, the events below MUST trigger a transition from > > > DF_DONE to DF_CALC: > > > > > > Then why are they not listed as part of the referenced FSM (or at least > > > mentioned with a forward-reference)? > > > [JORGE] we added the following at the end of section 3.1: > > > "The above events and transitions are defined for the Default DF Election Algorithm. As described in Section 5, the use of the AC-DF capability introduces additional events and transitions." > > > > Sounds good. > > > > > Section 7 > > > > > > Are there any considerations to discuss about increased resource > > > consumption (e.g., for storing and transmiting Ethernet A-Ds per-<ES,VLAN> > > > vs. per-<ES,VLAN Bundle>) and the risk of DoS due to reaching resource > > > caps? > > > [JORGE] we don't think there are additional security considerations since there are no additional Ethernet A-D routes suggested by the procedures in this document. The procedures suggest to change the DF Election and make it per VLAN as opposed to per VLAN bundle, but the amount of routes needed does not change. > > > > Okay, thanks for thinking about it. > > > > -Benjamin > > > > > > > > > > > Note that the network will not benefit of the new > > > procedures if the configuration of one of the PEs in the ES is > > > changed to the Default [RFC7432] DF Election. > > > > > > Isn't this the case if there is not unanimity among all PEs in the ES about > > > what election algorithm is preferred, which is a broader possible case than > > > one being changed to use the default algorithms? > > > [JORGE] ok, changed to: > > > "Note that the network will not benefit of the new procedures if the DF Election Alg is not consistently configured on all the PEs in the ES (if there is no unanimity among all the PEs, the DF Election Alg falls back to the Default [RFC7432] DF Election)." > > > > > > Section 8 > > > > > > o Allocate Sub-Type value 0x06 in the "EVPN Extended Community Sub- > > > Types" registry defined in [RFC7153] as follows: > > > > > > Sometimes we see language about "confirm the existing early allocation", > > > but I assume that the RFC Editor and IANA have a standard way of sorting > > > this stuff out. > > > [JORGE] ok, we'll wait for RFC Editor / IANA edits. > > > > > > > > > > > > > Thanks, > > --Satya > > _______________________________________________ BESS mailing list BESS@ietf.org https://www.ietf.org/mailman/listinfo/bess