Re: [aqm] Review of draft-ietf-aqm-ecn-benefits-03

Gorry Fairhurst Tue, 28 Apr 2015 06:00:05 -0700

Dear Mirja,

Thank you very much for your detailed review! Answers below:


> Begin forwarded message:
>
> Date: 23. april 2015 kl. 19.28.54 CEST
> From: Mirja Kühlewind <[email protected]>

> To: <[email protected]>, Michael Welzl <[email protected]>, GorryFairhurst <[email protected]>

> Subject: Review of draft-ietf-aqm-ecn-benefits-03
>
> Hi Gorry, hi Michael,
>
> as promised here is my review of draft-ietf-aqm-ecn-benefits-03.
>

> My overall comment is that even after reading the document (or evenslightly more than before) I'm not completely sure what the purpose ofthis document is and also what the audience is this documented isdirected to. Currently this document seems to do two things: 1. it listbenefits (which is interesting for someone who thinks about enablingECN) and 2. it kind of outlines needed steps for deployment (which wouldbe directed to someone who gets the task from his manager to turn onECN). However, the second point is not clearly spelled out and thereforeit might be rather confusing for some people to read the second part ofthe document. Also the second part is to some extend stillwork-in-progress, therefor I would recommend to only focus this documenton the first part.

> For the first part (listing benefits) it might also be good to makeclear/distinguish who has these benefits. I think all benefits that arecurrently listed are only advantageous for the end host/application. Arethere any benefits for a network operator? Would it be possible to writethis document such that I could also use it to point network operatorsto and give them an incentive to enable ECN?

MW/GF: we can only think of one benefit of ECN as currently defined(i.e. without basing it on ConEx documents) that obviously targets thenetwork operator: making incipient congestion visible (such that itcould be used e.g. for ConEx). This is addressed in section 3.5. Sincethis is one out of six listed benefits in the table, creating categoriesfor end host / application vs. network operator seems unnecessary to us.

> Another high level comment is that you say in the introduction thatthis document "also identifies some potential problems that might occurwhen ECN is used" but then you don't really discuss them. I think toshow both sides of the coin in this document would make the documentmore useful (and more honest). One point that you mention slightly hereis that cheating is easier than with loss by not providing the feedback.Another point might be fairness between ECN and non-ECN traffic asmarking will not reduce the queue length and therefore might lead to ahigher loss rate for the non-ECN traffic instaed. I guess there arepapers about this; don't have any by hand right now. Are there any otherproblems that should be mentioned?

MW/GF: This was discussed, and we agreed to remove the "drawbacks"discussion, to align with the original proposed work. So, we will removethis sentence from the introduction (it is in fact a left-over thatshould have been removed before). As for fairness, it seems to us thatthe related thread has concluded without a clear result. In the absenceof evidence or references we prefer to stay away from hand-waving aboutthis matter in the document.


> Find more detailed comment by section below:
>
> Abstract
> --------

> ...says "...potential benefits when applications enable ExplicitCongestion Notification (ECN)" -> usually an application cannot able ECNbecause usually it's a system setting...?


MW/GF: Good catch! We'll rephrase this as "..when enabling".


> Section 1
> ---------
> ... says "..separate
>   configuration of the drop and mark thresholds is known to be
>   supported in some network devices and this is recommended
>   [RFC2309.bis]."

> RFC2309bis does not recommend different settings, it only say that itshould be possible have different configuration of both. Further, Ithink this should not only concern THE threshold (whatever this is) butusually there are several parameters you might want to set independentof each other, e.g. the max mark/drop probability in RED.


MW/GF: Suggested update:
"While it has often been assumed
that network devices should CE-mark packets at the same level of
congestion at which they would otherwise have dropped them, separate

configuration of the drop and mark conditions. Such separateconfiguration is

known to be supported in some network devices and this is recommended
[RFC2309.bis]."

> Section 2
> ----------

> 1) I'm not sure I understand the purpose of this section or maybejust the title is wrong. I'm currently seeing this section rather as asection that provides the needed background knowledge than is talkingabout deployment. For this purpose I'd put all references andpotentially a brief summary to other RFC/drafts on ECN in this sectionincluding RFC2884, RFC4774, RFC5562, RFC6040, RFC6679,draft-briscoe-tsvwg-ecn-encap-guidelines and draft-ietf-tcpm-accecn-reqs(and rename it).

MW/GF: This section lists requirements for deployment. Suggestion:rename to "ECN deployment requirements"


>
> 2) Second paragraph says:

> "Network devices must not drop packets solely because thesecodepoints are used [RFC2309.bis]."> Not sure this is the right document to says this (because currentlyit not seems to be directed to network operator/equipment vendors butadmins/application developers). However, if it says this, it should alsosay that network devices should not bleach these bits.

MW/GF: suggest: "Network devices must not drop packets solely becausethese codepoints are used or erase these codepoints [RFC2309.bis]."



> 3) First bullet in list says

> "A recent survey reported growing support for ECN on common networkpaths [TR15]."> This sounds like TR15 shows that ECN is actually used in theInternet. However, TR15 only shows that there are only very few casesleft where ECN packets are dropped or incorrectly altered. Pleaseclarify or remove this sentence here.

MW/GF: suggest: "A recent survey reported that incorrect altering of ECNbits or consistent dropping of packets carrying the ECN codepoint israre on common network paths [TR15]."

> 4) You could cite draft-bensley-tcpm-dctcp-00 instead of the DCTCPSigcomm paper (or both).


MW/GF: The paper is a stable reference for now.
But if/when the IETF decides on this, we can add a reference.

> 5) I would remove the subsection headings (both 2.1 and 2.2) and justadd the text there to the main part of the section.


MW/GF: OK


> 6) "An AQM algorithm that supports ECN needs to define
>   the threshold and algorithm for ECN-marking."

> This is kind of self-redundant and therefore does not really makessense to me to say; of course an algo that supports ECN needs to saysomething about ECN...

MW/GF: We agree, but suggest to keep it nevertheless, it is a hint todocument authors to not forget that they should specify ECN rather thanjust assuming some default behaviour.

> 7) You can use TR15 to provide a reference for the first paragraph insection 2.2:

> "Cases have been noted where a sending endpoint marks a packet with a
>   non-zero ECN mark, but the packet is received with a zero ECN value
>   by the remote endpoint."

MW/GF: OK, will add the reference there

> 8) I'd move the second paragraph of section 2.2. ("The current..") toa potentially new problems section, talking about known/previousdeployment problems.

MW/GF: the document does not accentuate problems in this way, as aresult of prior discussion. We therefore think that this paragraph is okin its current place.

> 9) I would simply remove paragraph 3-4 of section 2.2 because thiswas basically as already mentioned by referring to 2309bis and rfc6040in section 2.1.

MW/GF: we do think these paragraphs add value here: they describe theproblem in greater detail than the text before, explaining the problemhere is different - and, we think, better - than just pointing toreferences.



> Section 3.2
> ------------

> 1) Don't understand why there is a listing here...? Just remove thelisting and make text out of it...?


MW/GF: This is to help identify the entities that need to collaborate.


> 2) The sentence "This also
>      avoids the inefficiency of dropping data that has already made it
>      across at least part of the network path."

> does not belong in this section. This sentence should just be movedto section 3.1 (or in an own section) and must be further explained,saying that dropping packet at the of the path has already blockedresources that other traffic could have used otherwise.


MW/GF: agreed. We will insert it at the beginning of section 3.1.


> Section 3.3
> -----------

> 1) I'd say this section misses on part of the discussion. It is truethat if by chance your last packet(s) get lost ECN can help. However,this section reads a little like, with ECN it is save to send packetbursts. Which is not true because even if ECN is used by a networkdevice, the queue might be too small to hold the whole burst. I believethis case happen very often which might be a reason for the higher tailloss probability that sometimes is experienced with IW10. Please addthis point to the discussion.

MW/GF: we agree that we shouldn't say that "with ECN it is ok to sendpacket bursts" - we want to stay away from such general recommendationsand just state the potential benefit of ECN when it saves the lastpacket of a burst. See our next comment for more:

> 2) I don't really get the point of the second paragraph. First of allit is confusion that this paragraph starts which "In addition toavoiding HOL blocking,.."; I guess that is left over from a previousversion of this text...? And then you talk about a connection that iscurrently idle, so why is the performance of this connection that iscurrently not sending anything reduced?

MW/GF: indeed it seems that this paragraph has been mangled duringupdates. To address your item 1 and 2, we suggest the following replacement:


***
"While using ECN can never guarantee loss prevention, and thus losses
at the end of a burst can occur with or without ECN, using ECN can increase

the chance for that last packet to be ECN-marked instead of dropped.This can allow the

transport to avoid the consequent loss of state about the network path it is
   using, which would have arisen had there been a retransmission
   timeout.  Typical impacts of a transport timeout are to reset path
   estimates such as the RTT, the congestion window, and possibly other
   transport state that can reduce the performance of the transport
   until it again adapts to the path."
***

> 3) I don't understand what "applications that send intermittentbursts of data, and rely upon timer-based recovery of packet loss"are...? Isn't the transport responsible to not send bursts and careabout recovery...?

MW/GF: MPEG-DASH traffic for instance, in particular when used overnon-paced TCP. UDP-based applications too.

> 4) For the last paragraph in section 3.3 note that stacks oftenremember RTT measurements for a certain IP address and set the initialRTO based on this information.


MW/GF: suggestion: replace:
***
because in this
   case TCP cannot base the timeout period on prior RTT measurements
   from the same connection.
***
with:
***
because in this

case TCP may not be able to base the timeout period on prior RTTmeasurements.

***


> Section 3.4
> -----------

> You still need FEC or some kind of error concealment even if ECN isused because you can never be sure that your packet are not get dropped(by non-ECN-enable devices or other reasons). Therefore using ECN willclearly not reduce complexity. The only thing you can do is topotentially reduce the amount of redundancy you send if you know that acertain path is ECN enables or don't see losses at the beginning of aconnection. This can save network resources but actually might notimprove user experience; in fact the user experience might be worse incase there are sudden losses.


MW/GF: suggestion: remove "add complexity and"

> Further the text says "negative impact of using loss-hidingmechanisms"; I don't really think that FEC has a negative impact as longas you've send enough redundancy...? Error concealment might but is usedless and less. I'd recommend to talk about error concealment only inthis last paragraph and explain a little further.

MW/GF: error concealment is different from FEC, and it is only mentionedin this last paragraph. We suggest to replace "Because thisreduces the negative impact of using loss-hiding mechanisms," with"Because this can reduce the potential negative impact that someloss-hiding mechanisms can have,"



> Section 3.5
> ----------
> "Recording the presence of CE-marked packets can therefore provide
>   information about the performance of the network path."
> Would change to:

> "Recording the presence of CE-marked packets in absence of loss cantherefore provide

>   information about the performance of the network path."

MW/GF: ok

> And also say more concretely what is meant with 'performance of thenetwork path' -> congestion level or no drops by other middleboxes onthis path...

MW/GF: This intentionally was kept this vague, but we'd welcome aconcrete recommendation by a ConEx expert (indeed "or .. or ..." isthe problem, there are several possibilities here)



> Section 3.6
> -----------

> 1) I like the section but I would phrase it differently; also it'snot clear who needs to support what in this case. I'd like to proposethe following text [not sure about the heading...]:

>
> "3.6 Opportunity to provide an improved congestion feedback signal
>

> Loss and ECN marking are both used as an indication for congestion.However, while the amount of feedback that is provided by loss shouldnaturally be minimized, this is not the case for ECN. With ECN a networknode could provide richer and more frequent feedback on the congestionstate of a link which then could be used by the control mechanismsimplemented in end host to make a more appropriate decision on how toreact to congestion and to react faster to changes in congestion state.


MW/GF: ok to add this up to here.

> Further while drop-based AQM mechanisms usually operate on a smoothedqueue length estimation (instead of the instantaneous queue length) andtherefore slightly delay the feedback signal to avoid unnecessary lossesin case of transient congestion, this would be not necessary for ECN. Ifcongestion is only transient due to short traffic bursts that are activefor less than one RTT, the congestion signal would reach the sender at atime where the congestion is already cleared up. However, insteaddelaying the feedback in the network, the end host could reduce itssending rate incrementally based on the extend of congestion (that wasexperienced over e.g. the last RTT) similar as DCTCP. In case if thecongestion is only transient, the end host would only reduce its rateslightly and be able to catch up quickly again. However, in case thecongestion is persistent, this would help to remove additional delaysfrom the network and resolve congestion faster which after all reducesthe average queuing delay.

> However, current ECN is defined as a 'drop equivalent' in RFC3168. Tochange the semantics of ECN both the AQM in the network nodes and thecontrol mechanism in the end hosts would still need to cope with nodesor end hosts that rely on the old semantics. Therefore changing thesemantics can be done more easily in confined environment such as a datacenter. DCTCP is an example that changes both the configuration of theused AQM as well as the congestion response in the end host and relieson that fact that all nodes in data center are configured the same way.[Deployment strategies to change the semantics of ECN in the Internetare currently under discussion in the IETF.]"

MW/GF: We think that this goes a bit too far in the direction of hintingabout implementation and research possibilities that we don't havecitable proof about (besides: we already refer to DCTCP twice in thedocument, and the 'drop equivalent' semantics are not a MUST in RFC3168).

> 2) I'd move the 1. and 2. paragraph of section 3.6.1 to thebackground/deployment section or to the intro depending what you goingto do with section 2.

MW/GF: since we intend section 2 to be about deployment requirementsonly, we don't think this fits and would rather leave these paragraphsin section 3.6.1.



> Sections 4 & 5
> ---------
> First sentence talks about "operational
>   difficulties when the network only partially supports the use of ECN,
>   or to respond to the challenges due to misbehaving network devices
>   and/or endpoints".

> I think these are to very different things. Misbehaving networkdevices is a point for a problems section (where the lesson learned isthat we didn't think carefully enough about incremental deployment inthe first place but do now). However, partial deployment is not aproblem but is a thing we simply have to cope with. The text sound as ifthe goal would be that every router in the whole Internet would at somepoint of time be ECN-enabled. I don't think this will ever happen and isalso not the goal for me. Routers that are very unlikely to ever getcongested should no be required to look at the ECN bits or monitor thequeue length to calculate a mark/drop probability.

MW/GF: we agree, and suggest to replace this sentence with "Earlydeployment of ECN encountered a number of operational

   difficulties due to misbehaving network devices
   and/or endpoints."

> However as I said at the beginning I don't really thing that sections4 and 5 belong in this document. If you decided to keep them (you haveto change the abstract) and I'd recommend to rename them e.g 4.'Incremental Deployment Strategy' or 'Requirements to enable IncrementalDeployment' and 5. 'Recommendations for enabling ECN in network nodesand end hosts'.

MW/GF: we suggest to insert the fact that we discuss deployment in theabstract, and rename these sections to 4.: "Incremental Deployment" and"Recommendations for enabling ECN"



> I hope that's helpful! Let me know if you have any questions!
>
> Mirja
>
>


Thank you very much,

Michael & Gorry


_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] Review of draft-ietf-aqm-ecn-benefits-03

Reply via email to