On Sat, Jan 19, 2019 at 12:38:02AM +0000, Satya Mohanty (satyamoh) wrote: > Hi Benjamin, > > > > Thanks for your comments. My replies inline [Satya] > > > > On 1/18/19, 8:01 AM, "Benjamin Kaduk" <[email protected]> wrote: > > > > On Thu, Jan 17, 2019 at 10:10:16PM +0000, Rabadan, Jorge (Nokia - > US/Mountain View) wrote: > > > Benjamin, > > > > > > Thank you very much for your review. > > > Satya and I have looked at your points one by one, please see in-line. > Let us know if you are ok now. We'll publish revision 08 soon. > > > > I see that the -08 has landed since I started composing this message -- > > thanks! Some more comments inline. > > > > > Thanks. > > > Jorge > > > > > > -----Original Message----- > > > From: Benjamin Kaduk <[email protected]> > > > Date: Tuesday, January 8, 2019 at 6:52 PM > > > To: The IESG <[email protected]> > > > Cc: "[email protected]" > <[email protected]>, Stephane Litkowski > <[email protected]>, "[email protected]" > <[email protected]>, "[email protected]" > <[email protected]>, "[email protected]" <[email protected]> > > > Subject: Benjamin Kaduk's Discuss on > draft-ietf-bess-evpn-df-election-framework-07: (with DISCUSS and COMMENT) > > > Resent-From: <[email protected]> > > > Resent-To: <[email protected]>, <[email protected]>, > <[email protected]>, <[email protected]>, <[email protected]>, > <[email protected]> > > > Resent-Date: Tuesday, January 8, 2019 at 6:51 PM > > > > > > Benjamin Kaduk has entered the following ballot position for > > > draft-ietf-bess-evpn-df-election-framework-07: Discuss > > > > > > When responding, please keep the subject line intact and reply to > all > > > email addresses included in the To and CC lines. (Feel free to cut > this > > > introductory paragraph, however.) > > > > > > > > > Please refer to > https://www.ietf.org/iesg/statement/discuss-criteria.html > > > for more information about IESG DISCUSS and COMMENT positions. > > > > > > > > > The document, along with other ballot positions, can be found here: > > > > https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-df-election-framework/ > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > DISCUSS: > > > > ---------------------------------------------------------------------- > > > > > > > > > It's not really clear to me that the question of Updating 7432 > has been > > > settled by the responses to the directorate reviews; I've noted > a few > > > places in the text that are problematic in this regard, in the > COMMENT > > > section. > > > > > > [JORGE] We finally agreed on making it updating 7432 and explain why > in the intro/abstract. Thanks. > > > > > > > > > [concerns about combinatoric explosion were overblown; removed] > > > > > > Section 3.3 > > > > > > Section 7.6 of [RFC7432] describes how the value of the > ES-Import > > > Route Target for ESI types 1, 2, and 3 can be auto-derived > by using > > > the high-order six bytes of the nine byte ESI value. The > same auto- > > > derivation procedure can be extended to ESI types 0, 4, and > 5 as long > > > as it is ensured that the auto-derived values for ES-Import > RT among > > > different ES types don't overlap. > > > > > > How do I ensure that the auto-derived values don't overlap? > > > > > > [JORGE] The autoderivation in RFC7432 for ESI types 1, 2 and 3 can be > used "only if it produces ESIs that satisfy the uniqueness requirement > specified above." RFC7432 does not specify how the overlap is avoided, it is > out of scope, but one may think the operator must manually check that the > values auto-deriving the 9-octet ESI value don't match. We added the > following text, let us know if it is okay: > > > "As in [RFC7432], the mechanism to guarantee that the auto-derived ESI > or ES-import RT values for different ESIs do not match is out of scope of > this document." > > > > That suffices to resolve the Discuss point. > > Having said that, though, Martin did some research and had some good > points > > on the telechat, that type 0 has 9 octets of fully arbitrary value, > whereas > > types 4 and 5 use as a base the router ID and also have a local > > discriminator. (IIRC, types 1, 2, and 3 were generally going to be unique > > as inherent properties of how they are determined, e.g., via the global > > uniqueness of MAC addresses.) I would suggest (but not insist upon) > > mentioning something about the operator being able to use the > discriminator > > and type-0 arbitrary values to ensure the needed non-overlap. > > > > > > > Section 4.2 > > > > > > The ESI value MAY be set to all 0's in the > Weight > > > function below if the operator so chooses. > > > > > > I'm not 100% sure I'm interpreting this correctly, but this > sounds like a > > > piece of device-specific configuration (i.e., configured by the > operator) > > > that must be the same across all devices for correct operation, > but is not > > > covered by the advertisement of the DF Election Exctended > Community. This > > > would decrease the robustness of the system to basically the > "experimental" > > > level of DF election algorithm 31, which also relies on > universal agreement > > > of manual configuration. Is this actually something we want to > include? > > > > > > [Satya] This is to accommodate the case in > https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00. > > > Specifically, if the same set of PES are multi-homed to the same set of > ESes, setting the ES to 0, would result in the "same unique" PE be the DF for > a given EVI for all those ESes. > > > Use can be made of this property in the optimization in > https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00. > > > Yes, it would need manual configuration to enable this. > > > > I think I did not properly convey the point I was concerned about, here. > > What I'm concerned about is that if there are separate knobs for using HRW > > as the DF alg, and for setting the ESI input to zero, then we have a new > > failure mode in the case when all routers agree to use HRW but disagree > > about whether to set ESI to zero. In particular, it seems like that could > > cause two routers to both think they are the active DF, which is IIUC > > really bad, especially so since it defeats the mechanism introduced by > this > > document to ensure that all PEs agree what algorithm is being used (with > > fallback to the default algorithm). It would seem much safer to just have > > two DF algorithms -- one for HRW-including-ESI and another for > > HRW-with-ESI-set-to-zero (or HRW-without-ESI, really), and allow > > deployments to have the full protection of the election algorithm being > > defined here. > > > > [Satya] Agreed that you have a point. We can indeed have two algorithms, one > for HRW-including-ESI and another for HRW-without-ESI. > > I can incorporate that in > https://tools.ietf.org/html/draft-mohanty-bess-evpn-bum-opt-00 when we > refresh it before this IETF (we are delayed on that). > > Will that suffice to address your concern?
So the idea would be to remove the discussion from this document about setting ESI to zero, and have draft-mohanty-bess-evpn-bum-opt allocate a new DF Algorithm codepoint? That would be what I see as the optimal solution. > > > As an additional asid, I'm not 100% sure I see the applicability of the > > zero-ESI case ot the draft-mohanty-bess-evpn-bum-opt case -- from reading > > the document itself I was wondering if the zero-ESI input was needed so > > that (using that document's Figure 1) PE5 would be able to tell which of > > PE1/2/3 wer the DF for the ES attaching CE1. But my reading of your text > > above is that the goal would be to try to get all BUM traffic to that one > > "same unique" PE that is DF for *all* ESes, letting the other PEs avoid > > getting any BUM traffic. In any case, this is just a side note, since I > > don't think it matters for the discussion in my previous paragraph here. > > [Satya] Your understanding is on the correct lines. That is the intention of > the zero-ESI. > > In that case, the DF will be same for all the ES. > > But in that draft, since we also advertise the IMET from the BDF and other > PEs that may be singly-homed, PE5 will not necessarily know who is the DF. > > > > > Section 5 > > > > > > The AC-DF capability MAY be used with any "DF Alg" > algorithm. It MUST > > > > > > As written, this suggests that it is true for any current or > future > > > algorithm, which is in conflict with the text in Section 3.2 > that notes > > > that "for any new DF Alg defined in future, its > applicability/compatibility > > > to the existing capabilities must be assessed on a case by case > basis." It > > > seems more prudent to make the assessment after the relevant > technologies > > > are both extant, so I would suggest this be non-normative text, > perhaps > > > "the AC-DF capability is expected to be of general > applicability with any > > > future 'DF Alg' algorithm". > > > > > > [JORGE] Good point. We added "The AC-DF capability is expected to be of > general applicability with any future DF Algorithm." > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > COMMENT: > > > > ---------------------------------------------------------------------- > > > > > > Section 1.2.1 > > > > > > I a little bit wonder if the risk of poor distribution of DFs > with the > > > default algorithm is being oversold -- any "hash identifiers > into buckets" > > > scheme will be susceptible to pessimal input, but if the inputs > are not > > > attacker-controlled and the pessimal inputs are unlikely to > occur randomly, > > > we may not need to care. > > > > > > 2- Even in the case when the Ethernet Tag distribution is > uniform the > > > instance of a PE being up or down results in > re-computation ((v > > > mod N-1) or (v mod N+1) as is the case); the resulting > modulus > > > value need not be uniformly distributed because it can be > subject > > > to the primality of N-1 or N+1 as may be the case. > > > > > > This is making some assumptions about the (potential) > distribution of the > > > tag values that could be made more clear, as otherwise the > primality is > > > not particularly relevant (particularly for an actual uniform > distribution > > > that covers all possible values). Similarly below, by the CLRS > reference > > > (CLRS probably has the ability to assume that we're running on > binary > > > computers and may even be doing things like operating on > pointers, which > > > tend to have fixed structure in the low-order bits due to > alignment > > > considerations, etc. For these (human-allocated?) integer > identifiers it's > > > less clear what assumptions should come into play.) > > > > > > [Satya] As I replied to Adam: > > > > Adam had a better description of how to improve this text, and I do agree > > that the new text is much more clear -- thank you both to Adam and the > > authors for working on this! > > [Satya] Thank you for emphasizing on this and bringing it to our attention. > > > > > The Ethernet tag that identifies the BD can be as large as 2^24; > however, it is not guaranteed that the tenant BD on the ES will conform to a > uniform distribution. In fact, it up to the customer what BDs they will > configure on the ES. Quoting Knuth [Art of Computer Programming Pg. 516] > > > " In general, we want to avoid values of M that divide r^k+a or > r^k−a, where k > > > and a are small numbers and r is the radix of the alphabetic > character set > > > (usually r=64, 256 or 100), since a remainder modulo such a value > of M tends > > > to be largely a simple superposition of key digits. Such > considerations > > > suggest that we choose M to be a prime number such that > r^k!=a(modulo)M or > > > r^k!=−a(modulo)M for small k & a." > > > In our case, N is the number of PEs in RFC 7432 which corresponds to M > above. > > > Since N, N-1 or N+1 need not satisfy the primality properties of the M > above; as per RFC 7432 modulo based DF assignment, whenever a PE goes down or > a new PE boots up (hosting the same Ethernet Segment), the modulo scheme need > not necessarily map BDs to PEs uniformly. > > > > > > Section 1.3 > > > > > > Section 2.2 describes some of the issues that exist in the > Default DF > > > > > > There is no section 2.2; presumably this is supposed to be 1.2. > > > [JORGE] changed, thx. > > > > > > o HRW and AC-DF mechanisms are independent of each other. > Therefore, > > > a PE MAY support either HRW or AC-DF independently or MAY > support > > > both of them together. A PE MAY also support AC-DF > capability along > > > with the Default DF election algorithm per [RFC7432]. > > > > > > This seems a little confusing since just a couple paragraphs > ago you are > > > distinguishing between "election algorithms" and > "capabilities", but here > > > the two new things (one of each type) are lumped together as > "mechanisms". > > > If election algorithms and capabilities are inherently > independent things, > > > then maybe there is not a need to reiterate the independence of > HRW and > > > AC-DF here. > > > [JORGE] they are indeed independent, but since in the future may be > some capabilities that only make sense for certain DF Algs, we believe it is > better to explicitly state here that the DF Alg and capability defined are > compatible. Let us know if it is not okay. > > > > It is fine to leave as-is. > > > > > Section 3 > > > > > > This section describes the BGP extensions required to > support the new > > > DF Election procedures. In addition, since the EVPN > specification > > > [RFC7432] does leave several questions open as to the > precise final > > > state machine behavior of the DF election, section 3.1 > describes > > > precisely the intended behavior. > > > > > > This text sounds like we should be Update:ing 7432. > > > [JORGE] yes, we finally converged into this too. Rev 08 will update > 7432. Thanks. > > > > > > Section 3.2 > > > > > > - Otherwise if even a single advertisement for the type-4 > route is > > > not received with the locally configured DF Alg and > capability, > > > > > > nit: shouldn't this be "received without"? > > > [JORGE] fixed, thanks. > > > > > > Section 3.2.1 > > > > > > [RFC7432] implementations (i.e., those that predate this > > > specification) will not advertise the DF Election Extended > Community. > > > > > > This wording also suggests that we should be Update:ing 7432. > > > [JORGE] done. Thx. > > > > > > Section 4 > > > I note that the state of the art in non-cryptographic fast > hashing has > > > improved a lot since 1998 and we have things like the Jenkins > hash that are > > > supposed to be superior to CRC-32 and such. > > > > > > [HRW1999] provides > pseudo-random > > > functions based on the Unix utilities rand and srand and > easily > > > constructed XOR functions that perform considerably well. > This > > > imparts very good properties in the load balancing context. > Also each > > > server independently and unambiguously arrives at the > primary server > > > selection. [...] > > > > > > It's not really clear to me that this text adds much value -- > we go on > > > later to say that we explicitly use a Wrand() function from > HRW1999. > > > > > > [Satya] Agreed. We can take it out for brevity. We can edit this a bit > as I don’t see it harming anything. > > > > > > Section 4.2 > > > > > > 1. DF(v) = Si: Weight(v, Es, Si) >= Weight(v, Es, Sj), for > all j. In > > > case of a tie, choose the PE whose IP address is > numerically the > > > least. Note 0 <= i,j < Number of PEs in the redundancy > group. > > > > > > I strongly suggest expanding out the notation with more words, > e.g. "DF(v) > > > is defined to be the address Si such that [...]". We probably > shouldn't > > > assume much abstract math background from RFC readership. > (Similarly for > > > BDF(v). The BDF(v) expression doesn't even say what the i, j, > and k are > > > evaluated over.) > > > Denote the PEs addresses as S0, S1, .. SN-1. > > > [Satya] DF(v): is defined to be the address Si (index i) for which > weight(v, Es, Si) is the highest, 0 <= i < N-1 > > > > The new text is a big help, here. I still expect readers to be confused > by > > the usage (in the retained enumerated list) of "DF(v) = Si:" that uses ":" > > as a shorthand for "such that" -- in my mathematics course, we used the > > vertical bar as such a shorthand, but it still remains something of a > > specialist notation that I don't expect to be familiar to the general > > reader base. > > [Satya] ok 😊 I can change with “|”. Okay, thanks. -Benjamin > > > > Similarly, BDF(v) is defined as that PE with address Sk for which the > computed weight is the next highest after the weight of the DF. > > > j is the running index from 0 to N-1, i, k are selected values. > > > > > > HRW solves the disadvantages pointed out in Section 2.2.1 and > > > ensures: > > > > > > Again, this is now Section 1.2.1 > > > [JORGE] fixed, thanks. > > > > > > o More importantly it avoids the needless disruption case of > Section > > > 2.2.1 (3), that is inherent in the existing Default DF > Election. > > > > > > and here. > > > (Also, this bullet point is just describing the same situation > as the > > > previous one, if I understand correctly.) > > > [JORGE] fixed, thanks. > > > > > > Section 5 > > > > > > modify the DF Election procedures by removing from > consideration any > > > candidate PE in the ES that cannot forward traffic on the AC > that > > > belongs to the BD. [...] > > > > > > What guarantees that the ACS information is available on all > PEs involved > > > in the election? > > > [JORGE] the ACS information is available on all PEs since it is > distributed by the A-D routes as explained later. The withdrawal of an A-D > per-EVI route indicates the AC state goes down. But since this is the > behavior in RFC7432, we don't think there is a need to explain that. > > > > Ah, that makes sense. Thank you for explaining it to me, with my inexpert > > 7432 knowledge. > > > > > In particular, when used with the Default DF Alg, the AC-DF > > > capability modifies the Step 3 in the DF Election procedure > described > > > in [RFC7432] Section 8.5, as follows: > > > > > > Only a single paragraph follows, but the referenced document > has three > > > paragraphs in the indicated step. Are the last two paragraphs > no longer > > > intended to apply? In particular, if we apply this paragraph > as a direct > > > replacement for the RFC 7432 step 3, then there is no longer a > normative > > > description of the modulus-based algorithm, which seems > incorrect. Also, > > > there's a lot of style/editorial changes, that make the > difference in > > > behavior harder to read from the diff. (Side note: I don't > think this > > > particular text implies that this document needs an Updates: > relation to > > > RFC 7432, since it is a behavior change conditional on the use > of a > > > negotiated feature.) > > > [JORGE] we changed the text to the following. Hopefully it makes it > clear: > > > ---------------------- > > > In particular, when used with the Default DF Alg, the AC-DF > > > capability modifies the Step 3 in the DF Election procedure > described > > > in [RFC7432] Section 8.5, as follows: > > > > > > 3. When the timer expires, each PE builds an ordered "candidate" > list > > > of the IP addresses of all the PE nodes attached to the > Ethernet > > > Segment (including itself), in increasing numeric value. The > > > candidate list is based on the Originator Router's IP > addresses of > > > the ES routes, but excludes any PE from whom no Ethernet A-D > per > > > ES route has been received, or from whom the route has been > > > withdrawn. Afterwards, the DF Election algorithm is applied on > a > > > per <ES, Ethernet Tag>, however, the IP address for a PE will > not > > > be considered candidate for a given <ES, Ethernet Tag> until > the > > > corresponding Ethernet A-D per EVI route has been received > from > > > that PE. In other words, the ACS on the ES for a given PE > must be > > > UP so that the PE is considered as candidate for a given BD. > If > > > the Default DF Alg is used, every PE in the resulting > candidate > > > list is then given an ordinal indicating its position in the > > > ordered list, starting with 0 as the ordinal for the PE with > the > > > numerically lowest IP address. The ordinals are used to > determine > > > which PE node will be the DF for a given Ethernet Tag on the > > > Ethernet Segment, using the following rule: > > > > > > Assuming a redundancy group of N PE nodes, for VLAN-based > service, > > > the PE with ordinal i is the DF for an <ES, Ethernet Tag V> > when > > > (V mod N)= i. In the case of VLAN-(aware) bundle service, > then the > > > numerically lowest VLAN value in that bundle on that ES MUST > be > > > used in the modulo function as Ethernet Tag. > > > > > > It should be noted that using the "Originating Router's IP > > > address" field in the Ethernet Segment route to get the PE IP > > > address needed for the ordered list allows for a CE to be > > > multihomed across different ASes if such a need ever arises. > > > > That is a big help, thanks. I might consider starting a new paragraph for > > "If the Default DF Alg is used[...]", but that's a very minor point. > > > > > <snip> > > > ---------------------- > > > > > > a) When PE1 and PE2 discover ES12, they advertise an ES > route for > > > ES12 with the associated ES-import extended community and > the DF > > > Election Extended Community indicating AC-DF=1; they > start a timer > > > at the same time. [...] > > > > > > (nit?) This text implies some synchronization between PE1 and > PE2 for > > > starting the timer, whereas I think the intent is just to note > that they > > > each start a timer as they advertise the route, independently > of each other. > > > [JORGE] good catch. Changed to: > > > "a) When PE1 and PE2 discover ES12, they advertise an ES route for > ES12 with the associated ES-import extended community and the DF Election > Extended Community indicating AC-DF=1; they start a DF Wait timer > (independently). Likewise, PE2 and PE3 advertise an ES route for ES23 with > AC-DF=1 and start a DF Wait timer." > > > > > > > > > In addition to the events defined in the FSM in Section 3.1, > the > > > following events SHALL modify the candidate PE list and > trigger the > > > DF re-election in a PE for a given <ES,VLAN> or <ES,VLAN > Bundle>. In > > > the FSM of Figure 3, the events below MUST trigger a > transition from > > > DF_DONE to DF_CALC: > > > > > > Then why are they not listed as part of the referenced FSM (or > at least > > > mentioned with a forward-reference)? > > > [JORGE] we added the following at the end of section 3.1: > > > "The above events and transitions are defined for the Default DF > Election Algorithm. As described in Section 5, the use of the AC-DF > capability introduces additional events and transitions." > > > > Sounds good. > > > > > Section 7 > > > > > > Are there any considerations to discuss about increased resource > > > consumption (e.g., for storing and transmiting Ethernet A-Ds > per-<ES,VLAN> > > > vs. per-<ES,VLAN Bundle>) and the risk of DoS due to reaching > resource > > > caps? > > > [JORGE] we don't think there are additional security considerations > since there are no additional Ethernet A-D routes suggested by the procedures > in this document. The procedures suggest to change the DF Election and make > it per VLAN as opposed to per VLAN bundle, but the amount of routes needed > does not change. > > > > Okay, thanks for thinking about it. > > > > -Benjamin > > > > > > > > > > > Note that the network will not benefit of the > new > > > procedures if the configuration of one of the PEs in the ES > is > > > changed to the Default [RFC7432] DF Election. > > > > > > Isn't this the case if there is not unanimity among all PEs in > the ES about > > > what election algorithm is preferred, which is a broader > possible case than > > > one being changed to use the default algorithms? > > > [JORGE] ok, changed to: > > > "Note that the network will not benefit of the new procedures if > the DF Election Alg is not consistently configured on all the PEs in the ES > (if there is no unanimity among all the PEs, the DF Election Alg falls back > to the Default [RFC7432] DF Election)." > > > > > > Section 8 > > > > > > o Allocate Sub-Type value 0x06 in the "EVPN Extended > Community Sub- > > > Types" registry defined in [RFC7153] as follows: > > > > > > Sometimes we see language about "confirm the existing early > allocation", > > > but I assume that the RFC Editor and IANA have a standard way > of sorting > > > this stuff out. > > > [JORGE] ok, we'll wait for RFC Editor / IANA edits. > > > > > > > > > > > > > Thanks, > > --Satya > > _______________________________________________ BESS mailing list [email protected] https://www.ietf.org/mailman/listinfo/bess
