Hi Bob, Gorry, all, A few comments in line:
> On 6. aug. 2015, at 16.23, Bob Briscoe <[email protected]> wrote: > > Gorry, > > On 06/08/15 09:56, Gorry Fairhurst wrote: >> A few comments in-line, as someone interested in taking things forward: >> >> On 04/08/2015 13:42, Bob Briscoe wrote: >>> John, >>> >>> [Ignore previous - clicked 'send' too early] >>> >>> There was much discussion about this identifier issue in Prague. Many >>> people are well-aware that the sticking point is availability of an >>> identifier, and its associated politics. >>> >>> >>> On 04/08/15 05:08, John Leslie wrote: >>>> Bob Briscoe <[email protected]> wrote: >>>>> I do not believe an IP (v4) option or a v6 extension would be necessary. >>>>> If ECT(1) were used that would surely be sufficient alone. >>>> Alas, we're facing a political question: not just a technical one. >>>> >>>> I think folks are ready to deprecate ECN Nonce; but I'm not >>>> optimistic that folks are ready to embrace "Low-Latency, Low-Loss and >>>> Scalable" (L4S) service, as introduced in draft-briscoe-aqm-dualq-coupled >>>> (Has this draft been posted?). >>>> >>>> (IMHO, this work promises to be very valuable for the _many_ uses >>>> that are latency-sensitive; but adoption is going to be a major >>>> challenge!) >>>> >>>> We may indeed eventually get to where Bob is thinking today; but >>>> I don't see a clear path to IETF-wide consensus yet. Even getting an >>>> Experimental RFC approved which re-purposes ECT(1) strikes me as a >>>> very significant challenge. :^( >>>> >> However, it's not the ECT(1) codepoint that would be problematic, but more >> the fact that there is only a single CE-mark. Hence, if one redefined ECT(1) >> to indicate a different use of ECN in the network, then it would likely need >> to result in CE-marking. Such packets would be indistinguishable from >> packets marked from ECT(0). All CE-marks would then need to be queued (and >> counted) the accounted for the same in the marking algorithms. As far as I >> understand, that isn't what the dual-queue method was wanting. > As I explained later, I believe a dual-use CE marking is not a problem. As > long as CE is always promoted (to L4S), not demoted (to Classic), it is OK. > > I.e., if some CE packets had started life as ECT(0) (Classic), misclassifying > them into the L4S queue would only forward them faster, not slower. They > would arrive earlier out of order than the rest of the Classic flow, but > earlier is OK, and it would be far less re-ordering than retransmission > already causes today. > >>>>> I believe the main criteria for an identifier for this new service are: >>>>> 1. preferably orthogonal to Diffserv classes. >>>> IMHO, Diffserv classes are poison! >>>> >>>> There are a number of good folks pursing a Less-than-Best-Effort >>>> diffserv class. I wish them luck! But I'd be amazed if they succeed. >>>> >>>> Diffserv classes are the private preserve of a _large_ number of >>>> network-service-providers. Best-Effort is the only one with universal >>>> agreement what it means. >>>> >>>> All the others are subject to non-documented shuffling at the >>>> boundaries between providers (and "bleaching" to Best-Effort at many >>>> points within providers' networks). >>> >>> DSCPs are only poison in standards. As you say, ISPs use DSCPs for many >>> internal services already. >>> >> There may be signs of hope here for increased use of network-wide markings >> (ever optimistic). >>> (BTW, the request for a global scope DSCP for less-than-BE is very >>> different to this case, because bleaching LBE (remarking to the all-zeros >>> DSCP at borders) always /increases/ its priority.) >>> >>>>> 2. preferably end-to-end in scope >>>> I'm really not sure how we can make L4S useful if it lacks an >>>> End-to-End meaning. The signal must enter at the sender and mostly >>>> survive all the way to the receiver in order that the receiver >>>> (by whatever magic) can tell the sender about any congestion. >>> >>> ISPs could go ahead with using a local DSCP now for L4S for their own >>> premium services. A large proportion of traffic these days is served from >>> within the same ISP as the user is connected to (esp using CDNs), so this >>> would be very "useful". Using a DSCP for alternate ECN semantics is already >>> recommended in RFC4774, so the IETF would not really need to do anything to >>> get L4S started. >>> >>> The IETF might want to head-off possible interop problems by assigning a >>> global-scope DSCP for the alternate ECN semantics, which we all know are in >>> short supply. If we did use a global-scope DSCP it would be solely for a >>> migration period, which is a double-edged sword: >>> * if the migration period were truly short-lived, the global codepoint >>> would become available soon. >>> * if the migration period took longer, fears that burning a global DSCP for >>> a just a brief migration period would have proved unfounded. >>> >>> If local-only L4S usage became widespread, it /could/ be used between >>> domains by simply ignoring the DSCP and only using ECN as the classifier. >>> But that all depends what the take-up of classic ECN is in the meantime, >>> and whether all the major classic ECN host implementations migrate to L4S. >>> (In the TCP Prague Bar BoF, Andrew McGregor made the point that all the >>> major OS developers who control what nearly all the Internet's traffic >>> looks like were in the room.) Then the IETF could come along afterwards and >>> standardise the new ECN semantics. >>> >>> >>>>> 3. preferably classic (RFC168) ECN and 'L4S' ECN would not permanently >>>>> burn two codepoints, since it seems that 'L4S' could eventually >>>>> subsume classic ECN (a fork would not be needed, because classic >>>>> ECN doesn't seem to do anything that L4S cannot do). >>>>> >>>> This is "nice to have", I suppose; but it seems too optimistic >>>> to take seriously. Deployment of L4S will take at least five years; >>>> and nobody's crystal-ball is good enough to see beyond that. >>> >>> Deployment of something that enables new valuable apps and products can >>> take 2yrs. >>> >>> Nonetheless, your general point is true. >>>> >>>> Furthermore, I don't see how we can _ever_ entirely eradicate the >>>> RFC 3168 behavior of "same as drop". >>> >>> <Flippant> According to measurement studies RFC3168 behaviour is currently >>> entirely eradicated. At least at such a low level that the two CE packets >>> that seemed to exhibit the behaviour are probably a symptom of bugs. >>> </Flippant> >>> >> Even this remark, I can't resist a response (although I see this also is >> discussed again later): >> >> That was not what our measurements looked for - the measurements we did with >> Brian, Mirja and Richard did not try to induce congestion. Specifically they >> were short test sequences to test ability to pass and negotiate the ECN >> codepoint usage. We hence don't *know* whether any network devices had >> ECN-marking support enabled, we just know that if they had, these devices >> didn't experience congestion at the time of the test. We also know they >> didn't experience (congestive) loss, for what little that's worth. >> >> My point is: I *do* have ECN enabled in at least one of my home routers, >> presently it's not a bottleneck, and hence I see no CE-marks, and if it >> becomes congested I know it will use FQ-Codel to mark (whatever way that >> does) - I can't control this as an endpoint - and in general as a transport >> I can't find out what ECN-marking has deployed along my path, if there >> happens to be any. > I prefer to distinguish between what relatively small numbers of enlightened > individual engineers might have deployed and what is deployed by large > organisations on behalf of 'the masses'. > > I am not saying we should screw-over those enlightened engineers. I'm just > saying that we don't need to worry about backwards compatibility for > individual early adopters, because > a) they are likely to keep up to date > b) if they don't, they are likely to understand that they need to update when > they experience problems. > > AFAICT, classic ECN is not in mainline Linux and not on-by-default in any > other router OS > So everyone needs to think really, really carefully before they allow that. "really, really carefully" only because that would "burn" the codepoint rather than reserving it for a scheme that requires installing a dual Queue in routers AND precise feedback from receivers AND a sender behaviour that (so far) isn't compatible with any other queue doing CE-marking? That's asking a bit much I think. >>> <Seriously>I know what you mean: RFC3168 behaviour is latent in ECN-enabled >>> servers waiting for a client request. >>> >> So: It's going to be hard to require (MUST) a new replacement marking in >> network devices. It may be possible to allow one, and this would be great if >> there were appropriate mechanisms at the endpoints to detect this and do >> something sensible. > > To clarify, I don't think you mean that an RFC will need to say network > devices MUST respond to this new replacement marking. Rather it will need to > say network devices that respond to classic ECT MUST also check for this new > replacement marking. > > And actually, I think this could be a SHOULD. As Matt Mathis said, the > consequences of not complying are no worse than the rate mismatch seen today > between two Classic TCP flows with significantly different RTTs. I challenge that. Why would that be the case? A DCTCP-like flow will almost not react in response to a single CE-mark in a window of packets, whereas any other sender sees that as a congestion event which normally comes with a multiplicative congestion control reaction. I agree that we shouldn't be religious about "TCP-friendliness" but this is not quite the same. A single DCTCP flow can certainly completely starve a large number of "normal" ECN-capable senders. > The additional problem here (at least for purists) is that RFCs are intended > to be followed by developers. It is not realistic to expect equipment > operators to follow RFCs. So, if an operator already has a router with ECN > implemented (e.g. Cisco), we are expecting to ask them not to turn ECN on > unless they update the code to distinguish Classic ECN from the new > replacement identifier. > > I call this as a problem for purists because it's only of concern for someone > who imagines it is important to keep the RFC series completely logically > consistent. In practice, we can discount the chance that an operator is going > to pick up a router and randomly turn on Classic ECN without finding out the > latest position, given none have turned on ECN since it was standardised 14 > years ago. > >>> I have proposed that L4S behaviour is associated with e2e negotiation of >>> new Accurate ECN semantics. Nonetheless, even if an L4S client attempts to >>> negotiate AccECN with a server, if the server only supports classic ECN, >>> the session should{Note 1} fall back to classic ECN. So we will need to >>> distinguish classic ECN from L4S ECN... unless we all agree that AccECN >>> must fall back to drop, even if the other end says it supports classic ECN. >>> >>> {Note 1} AccECN is yet to be specified by the IETF, but this is the current >>> thinking, which seems reasonable. >>> </Seriously> >>> >>>> Furthermore, L4S _can't_ >>>> eliminate packet drops; and IMHO a packet-drop in an L4S stream >>>> must be treated _differently_ than a L4S congestion mark. >>> >>> No-one is questioning that behaviour on drop needs to stay as in Reno or >>> Cubic. The question is only over whether behaviour in response to ECN-CE >>> should be distinct from drop behaviour. >>> >> I'd agree. Both ABE (as proposed in TCPM) and DCTP (also TCPM) would change >> this, as I believe should any new TCPM-defined methods based on AccECN. >>>>> *ECT(1) ** >>>>> *Seems a good identifier, but it has the following problems: >>>>> >>>>> a) L4S traffic would need to be distinguished from classic ECN both >>>>> when unmarked (ECT0 vs ECT1) and when marked (CE vs CE???). >>>>> Ie. congestion experienced (CE) would have to be shared between >>>>> the classes. >>>> >>>> Actually, there are _two_ ways ECT(1) could be used: >>>> - ECT(1) could be set to request L4S forwarding rules marking CE >>>> to indicate L4S congestion; or >>>> - L4S forwarders could change ECT(1) to ECT(0) (or vice-versa?), >>>> to mark L4S congestion. >>> >>> The latter doesn't work, I'm afraid. Reason: >>> * If all buffers on a path (say X, Y, Z) classify L4S and Classic by ECT(1) >>> and ECT(0) resp., >>> * and if buffer X indicates L4S congestion by changing some L4S packets >>> from ECT(1) to ECT(0) >>> * then at subsequent buffers on the path (Y or Z), the L4S packets that X >>> remarked to ECT(0) will get classified into the Classic queue at Y and Z. >>> >>> Result: a proportion L4S packets will get 'demoted' into low latency >>> queues, introducing intermittent re-ordering delay, thus increasing the >>> effective delay of the low latency L4S service to that of the classic >>> queues. >>> >> Sadly, I think probably true for any general deployment - this I think was >> one of the motivations for marking this method with a different DSCP when >> used internally within a provider network when RFC4774 was discussed. >>>> >>>>> It would not be so problematic if all queues classified all CE >>>>> packets as the lowest latency class (L4S); CE packets from classic >>>>> flows would then be delivered early out of order, requiring some >>>>> buffering, but probably no more buffering than is already needed >>>>> for retransmissions, and at least they would never be late out of >>>>> order. See also {Note 1}. >>>> I'm trying to follow this... >>>> >>>> What exactly does Bob mean by "all queues"? Mostly we think of >>>> queues as part of the forwarding action. But some forwarders choose >>>> their action upon packet entry to the queue; other at packet exit. >>>> And, AFAIR, no forwarder takes an action based upon the packet being >>>> CE-marked when it arrives. >>> >>> I didn't intend to say anything about whether actions are on entry or on >>> exit to the queue - I don't think that's relevant here. >>> >>> Again, I was thinking about the problem of one queue remarking some packets >>> in both L4S and Classic ECN to CE, then how those CE packets would get >>> classified at subsequent queues on the path. >>> >>> With solely L3 classification, all CE packets would have to be classified >>> as L4S, including those from Classic ECN flows marked as CE earlier on the >>> path. >>> >>> (I tried ASCII art, but email mangled it.) >>> >>> As I said, mis-classifying is not a problem as long as it is arranged to be >>> from the worse to the better queue. >>> >>> >>>> >>>>> b) ECT(1) is the last available ECN codepoint (for both v4 & v6). >>>>> Using ECT(1) for L4S and ECT(0) for Classic ECN would burn the last >>>>> codepoint just for migration purposes (contravening my criterion >>>>> #3). If we could predict that migration might one day finish, we >>>>> could foresee a time when ECT(0) might become available again. >>>>> But that's a long shot. >>>> This is a political problem, more than a technical one. >>>> >>>> We've painted ourselves into a corner, where there aren't spare >>>> bits -- and the "spare bits" in IPv6 turn out to be unusable. (We >>>> seem to have done this quite deliberatly -- I don't understand why!) >>>> >>>> Nonetheless, we have a major need to mark incipient congestion, so >>>> that we can avoid over-filling buffers at forwarding nodes. The fact >>>> that we have only half-a-bit left to do this is the inevitable result >>>> of our refusal to allocate enough bits in the first place (or if you >>>> prefer, our insistence on using six bits for DSCP, defined in such a >>>> way as to prevent end-to-end meaning of them). >>>> >>>> (Personally, I'd love to reclaim a few bits from DSCP; but to propose >>>> this would label me a clueless kook, so I won't.) >>>> >>>> ECT(1) is there! It's allocated for ECN use. Refusing to define it >>>> with an ECN meaning is simply irrational. >>>> >>>> Furthermore: there _is_ another bit! See RFC 3514. ;^) >>>> >>>>> c) For the record, the following uses of ECT(1) have been proposed by >>>>> the IETF and by researchers: >>>>> * receiver cheat detection (the ECN nonce [RFC3540] - experimental) >>>>> * ECN path testing (ECN for RTP [RFC6679] - standards track) >>>>> * various intermediate congestion level proposals (including PCN >>>>> [RFC6660] - standards track) >>>>> * various fast-start proposals (in research, e.g. VCP) >>>> IMHO, only RFCs count as "proposals". >>>> >>>> RFC 3540 is ripe for deprecation, IMHO. >>>> >>>> RFC 6679 covers "ECN for RTP over UDP". Somehow I missed it coming >>>> out in 2012 (though I must have been listening to the IESG telechat >>>> where it was approved). Mea culpa! >>> >>> I missed it too. During the WG process in AVT I wrote a long review >>> including concern about the ECT(1) parts. The authors took lots of my >>> concerns into account, but I wasn't awake to notice that they had left in >>> the ECT(1) parts when it went to WG last call and subsequently to IESG. >>> >>>> >>>> It's not an easy read (58 pages, heavy with RTP details)! At first >>>> blush, I don't see what it's trying to do with ECT(1). It references >>>> RFC 3168 for the meaning of ECT(1); it keeps separate counters for >>>> ECT(0) and ECT(1); and it has a "random" mode (not RECOMMENDED) which >>>> is supposed to randomize whether ECT(0) or ECT(1) is sent. >>>> >>>> The overall impression is that it tries to define feedback for all >>>> possible ECN cases: thus supporting ECN Nonce use as well as all other >>>> uses known at the time it was written. >>>> >>>> To deprecate ECN Nonce, we'd need to UPDATE RFC 6679 >>> >>> What about existing implementations of 6679? >>> >>>> as well as >>>> RFC 3168; but I don't see any new issues introduced by 6679 (and the >>>> features of it are already appropriate for L4S. >>>> >>>>> d) PCN is defined for a controlled environment, so that's not a problem. >>>>> The wording or RTP-ECN does not mandate the use of ECT(1), but it is >>>>> not always clear that it is optional either. >>>> Clearly, keeping separate counters for ECT(1) and ECT(0) is required; >>>> but sending ECT(1) vs ECT(0) is not specified within RFC 6679. >>>> >>>>> So I am trying to find out whether any implementations have used >>>>> ECT(1). >>>> At first blush, it would appear that the only _current_ use of ECT(1) >>>> would be for ECN Nonce. But of course, RFC 6679 says nothing to prevent >>>> its use for L4S. >>>> >>>>> Even if none of the IETF uses of ECT(1) are problematic in practice, >>>>> we should think very carefully before burning ECT(1) for L4S, >>>>> because there do appear to be new uses being proposed for it that >>>>> address a new potentially important class of problems: getting up to >>>>> speed fast. >>>> Some citations, please... >>> >>> VCP: Xia, Y., Subramanian, L., Stoica, I. & Kalyanaraman, S., "One more bit >>> is enough," Proc. ACM SIGCOMM'05, Computer Communication Review >>> 35(4):37--48 In: SIGCOMM '05: Proceedings of the 2005 conference on >>> Applications, technologies, architectures, and protocols for computer >>> communications Vol.35 No.4 pp.37-48 ACM Press (2005) >>> <http://doi.acm.org/10.1145/1080091.1080098> >>> >>> Kunniyur, S.S., "AntiECN Marking: A Marking Scheme for High Bandwidth Delay >>> Connections ," In: Proc. ICC'03 IEEE (May 2003) >>> <http://repository.upenn.edu/cgi/viewcontent.cgi?article=1053&context=ese_papers> >>> >>> >>>> >>>> (BTW, I think L4S could be _very_ helpful for "speeding up" slow-start.) >>> >>> Indeed. I have ideas myself too (unsurprisingly). >>> >>> I think the only workable schemes ought not to rely on a new packet >>> marking, but we might not be able to achieve this ideal, so I would be wary >>> of burning the last ECN codepoint 'just' to distinguish between something >>> we want to get to, and something we know will be legacy one day. >>> >>> >>>> >>>>> *DSCP** >>>>> *It might be better to distinguish L4S ECN from Classic ECN by using >>>>> only ECT(0) and CE, but also using a distinctive DS codepoint for L4S. >>>>> L4S could start off local-network only (e.g. for a network operator's >>>>> premium services), or a global DSCP could be burned so that hosts could >>>>> set it without needing to be configured for the network they happen to >>>>> be connected to at any one time. >>>> I don's see L4S as useful in "local-network-only" mode. >>> >>> Please listen to the ISPs who have seen the DCttH demo. They have their own >>> ideas of what is 'useful' to them. >>> >>>> >>>> Granted, there _are_ many cases where the benefit of L4S would be >>>> greatest at the first hop (DOCSIS box). But the expected "bleaching" >>>> could be very confusing as to the meaning of CE marking that could be >>>> generated farther along the path. There is no such thing as a condition >>>> where _only_ the first hop can experience congestion. >>> >>> If an ISP were using L4S for delivery from its local caches and servers, it >>> would be doing the bleaching itself at its border. It would not be overly >>> concerned if its bleaching prevented 'OTT' traffic from using the L4S >>> queues in the bottlenecks within its access network. >>> >>>> >>>>> Then, assuming all Classic ECN might eventually migrate to L4S ECN, >>>>> a DSCP would no longer be needed as well as ECT(0) to identify L4S. >>>>> Then the ECN field alone could represent L4S end-to-end. >>>> This is overly optimistic. >>> >>> Again, in Prague people from the major OS developers were talking as if >>> such optimism would not be so mad. >>> >>>> >>>>> We all know that DSCP has the following problems: >>>>> a) Diffserv is not orthogonal to Diffserv (obviously), so multiple DSCPs >>>>> might be needed for L4S in each DS class >>>> That seems fatal... >>> >>> Not necessarily. For instance, a separate L4S DSCP might only be needed to >>> distinguish BE L4S from BE Classic. >>> >>> Indeed, once L4S gives all traffic low latency, ISPs and enterprises don't >>> necessarily need to distinguish between EF and AF, etc. Then an ISP or >>> enterprise might make all its 'premium' (non-BE) traffic solely L4S without >>> any need to distinguish a Classic subset. >>> >>>> >>>>> b) DS is not end-to-end >>>>> c) few global DSCPs left, altho certainly there are more DS codepoints >>>>> than ECN codepoints left. >>>> Network operators don't believe in "global DSCPs" They bleach anyway. >>>> >>>> (I would tend to support carving out part of the "Experimental" >>>> subset of DHCPs as "must propagate if not understood" -- and possibly >>>> in ten years there might be enough equipment out there that respected >>>> that... but for now, it _all_ gets bleached.) >>> >>> Half true. Bilteral arrangements are starting to appear at some borders (DT >>> is leading the way). Where there is a global DSCP, this helps make such >>> bilateral arrangements simple to administer and deploy. >>> >>> (BTW, there is no point writing RFCs that dictate what operational policies >>> must be used.) >>> >>>> >>>>> *Summary** >>>>> *Combining ECT(0) and CE with a globally assigned DSCP solely during >>>>> initial deployment of L4S seems the least worst choice. >>>> We certainly could Experiment with that; but I'm very pessimistic. >>> >>> Unfortunately, I think the IETF has to make a choice. >>> >>> I overheard Spencer Dawkins say: "The IETF is only good at doing the right >>> thing when there is no other choice." >>> >>>> >>>> OTOH, Experimenting with ECT(1) seems likely to work. IMHO... >>> >>> I already said that either could 'work'. I tend to agree that using DSCP >>> /and/ ECN is likely to fall foul of the union of the sets of problems that >>> both suffer from. >>> >>> I'm currently in information gathering and promulgation mode, not decision >>> mode - I might switch to preferring ECT(1) depending on what we find out. >>> >>> My concern is primarily about burning the last ECN codepoint merely for a >>> transition arrangement. >>> >> I'd certainly need more persausion to use ECT(1) for this, given all that is >> above. A new DSCP requires a lot of consensus if it is to be standards >> action, and less if local use. To me at least, I think this is worth >> exploring, if the arguments can be presented simply. > > We would still require the RFC language discussed earlier: an ECN-enabled AQM > SHOULD check for the new identifier (DSCP in this case). - not a MUST? Because if it doesn't, your sender MUST be able to cope with that I'd say. Cheers, Michael _______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
