[aqm] Gaming ECN (again) (was: think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02)

Bob Briscoe Wed, 15 Apr 2015 02:23:52 -0700

David,

At 22:46 13/04/2015, David Lang wrote:

On Mon, 13 Apr 2015, Bob Briscoe wrote:
David,

Returning from a fortnight offlist...
I think your conception of how ECN works is incorrect. You describeECN as if the AQM marks one packet when it drops another packet.You say that the ECN-mark speeds up the retransmission of thedropped packet. On the contrary, the idea of classic ECN [RFC3168]is that the ECN marks replace the drops. In all known testing(except pathological cases), classic ECN effectively eliminatesdrops for all ECN-capable packets.
That's what I thought, and if that was the case, then markingpackets as ECN-capable would mean that they would have an advantageover non-ECN packets (by not getting dropped, so getting a highershare of bandwidth)

This is a common fallacy. An ECN-capable TCP achieves the samethroughput as an otherwise identical non-ECN TCP. The fallacy comesfrom people who think that the network caps the rate by removingpackets. But the source determines the rate by how many drop or marksignals it sees. For today's ('classic') ECN, the source's ratereduction in response to either is the same; both voluntary as well.

There will be a tiny difference in goodput, because of theretransmissions. However, the loss (or marking) probability that TCPuses to determine its rate is a fraction of a percent, so thisdifference is in the noise.

that's what the gaming ECN thread was about, and if I understood theresponses, I was being told that marking packets as ECN-capable, butnot slowing down (actually responding to ECN) would not let anapplication get any advantage because the packets would just end upgetting dropped anyway, since marking and dropping happen at thesame level, even on ECN-capable flows.


Nope.

I recall that thread. People came up with a number of complicatedarguments for why it is hard to game ECN, but none were solid. BelowI have given a simple argument that I think is solid on its own. Ithought about intevening at the time, but this stuff needs care andtime that I didn't have then.

1) For the most simple and complete argument, all you need to knowabout the ECN behaviour of AQMs is: under normal load conditions, anAQM decides it's time to send a congestion signal irrespective ofwhether the next packet is ECN-capable or not. Then, if the nextpacket is ECN-capable, it marks it, else it drops it. This is fromRFC3168, which also requires the source to respond equally to eithera loss or a mark. I call this 'classic' ECN.{Note 1}

2) I will try to correct your misunderstanding about "marking anddropping at the same level even on on ECN-capable flows". However, todetermine whether ECN can be gamed, there's no need to go there. SoI'll come back to that as a post-script{Note 2}.

3) I will prove that it is as easy to game loss as it is to game ECN,first considering sender cheating, then receiver cheating:


3a) Sender Cheating

From the sender's point of view, the only difference between a lossand an ECN mark is that it has to retransmit a loss. But that hasnothing to do with the rate it can go at. If it has been programmedto ignore congestion feedback (and instead to go at a constantunresponsive rate{Note 2}), it is as easy for it to ignore lossfeedback as ECN feedback. See {Note 3} for an example.


3b) Receiver Cheating

* An ECN receiver can best fool an ECN-capable TCP sender into goingfaster by only feeding back a small fraction of ECN marks.{Note 4}* A non-ECN receiver could fool a non-ECN TCP sender into goingfaster by only revealing a small fraction of the losses. However, itwould have to ACK undelivered bytes, and most TCP-based apps won'twork unless all bytes are delivered.{Note 5}


So it seems that it's easier for a receiver to game ECN than loss. However:

* returning to the ECN case, the sender can validate the receiver byrandomly setting an ECN mark itself on a very small proportion ofpackets (probably only on unusually high rate connections). Then ifit doesn't see ECN feedback on the ACK of any one of itsself-inserted marks, it can close the connection.


In summary,
* a sender can't game ECN any more easily than it can game loss.

* a receiver can only game ECN if the sender doesn't take measures toprevent it.{Note 6}

If the packets are just marked, but not dropped, then theECN-capable flows will occupy a disproportinate share of theavailable buffer space, since they just get marked instead of dropped.


Nope.

The arrival rates will be the same, whether or not ECN is used (seeearlier). And recall that TCP drives the marking or loss probabilityat very small fractions in all normal conditions.

Example: if there are 10 flows in a 100Mb/s link, 5 ECN and 5non-ECN, they will all arrive at the buffer at 10Mb/s (all otherfactors being equal). Then, if the loss or marking probability is0.5%, the AQM will be marking but not dropping 1 in 200 packets inthe ECN flows whereas it would drop 1 in 200 from the non-ECN flows.

So, assuming tail drop, if there were 399 packets in this buffer, onaverage 200 would be ECN-capable (20 in each flow) with one marked;and 199 would be non-ECN-capable (20 in each flow except one with19). And one of those 199 would be a retransimssion from an earlier loss.

[Of course, we would hope that there would be 4 packets in thebuffer, not 400. The proportions would still be the same on average.I merely used 399 to avoid fractions of packets for the averages.]



===Footnotes===

{Note 1} To be concrete, I've assumed classic ECN [RFC3168]. Theargument is similar for research approaches like "think once to mark,twice to drop", but let's not make it more complicated than it needs to be.

{Note 2} RFC3168 (and draft-ietf-aqm-recommendation) require that,whenever the AQM decides it is time to signal congestion on the nextpacket, if the queue has been persistently long, the AQM must onlyuse drop as a congestion signal, irrespective of whether the nextpacket is ECN-capable or not.

So, if a source naively just continues to increase its window untilit drives the queue into overload, then it would cause the AQM toturn off ECN and consequently not be able to game ECN. But the simplestrategy of sending at a high but constant rate avoids driving thequeue into overload. So that's the strategy I described for gamingECN. Because strategies that don't work are irrelevant if there's astrategy that does work.


{Note 3} Examples to show source cheating is as easy with loss as ECN:

* An ECN source sends at a constant unresponsive 90Mb/s through a100Mb/s bottleneck. In parallel some other responsive flows (say 10non-ECN TCP flows) squeeze themselves into the remaining 10Mb/s. Theywill cause themselves (say) 0.5% loss probability, while theunresponsive flow will experience 0.5% marking and zero loss.* A non-ECN source can just as easily send unresponsively at 90.5Mb/sas 90Mb/s. The other flows will still drive loss to about 0.5%, whichthe unresponsive flow will now experience as well. Nonetheless, afterit retransmits the 0.5% loss it still achieves goodput of about 90Mb/s.

{Note 4} Again, feeding back no marks at all would be naive, becauseit would drive the bottleneck into overload, causing it to turn offECN (and driving the loss-rate over a cliff). A better strategy is tofeedback only a small proportion. Because TCP's rate depends on thesquare root of the congestion probability, to download N timesfaster, the receiver should feed back only about 1 in N^2 of themarks or losses. E.g. to go 90 times faster, feed back 1 in 8100marks (or losses).

{Note 5} There are two classes of apps that use TCP but can get awaywithout reliable delivery:i) Some streaming media apps are designed with a loss-tolerantencoding, so they can use TCP but play out the media even if someretransmissions haven't arrived yet (e.g. using a raw socket at the receiver).ii) In the specific case of HTTP, a hacked receiver can open anotherconnection to the same server and download the byte-ranges it needsto repair the holes in the other connection.

{Note 6} ConEx (congestion exposure [RFC6789]) provides acomprehensive framework for the network to prevent senders andreceivers from cheating. However, for this argument, we don't need togo there either.


Cheers



Bob

David Lang


[snip]

ECN has potential cheating problems, but we have per-customer queues anyway.
Using flow as the unit of allocation also has its own problems,with no proposed solutions.


[snip]

Bob

At 05:16 30/03/2015, David Lang wrote:


[snip]

While AQM makes the network usable, there is still additional roomfor improvement. While dropping packets does result in the TCPsenders slowing down,and eventually stabilizing at around theright speed to keep the link fully utilized, the only way thatsenders have been able to detect problems is to discover that theyhave not received an ack for the traffic within the allowed time.This causes a 'bubble' in the flow as teh dropped packet must beretransmitted (and sometimes a significant amount of data afterthe dropped packet that did make it to the destination, but couldnot be acked because fo the missing packet).This "bubble" in the data flow can be greatly compressed byconfiguring the AQM algorithm to send an ECN packet to the senderwhen it drops a packet in a flow. The sender can then adaptfaster, slowing down it's new data, and re-sending the droppedpacket without having to wait for the timeout. This has two majoreffects by allowing the sender to retransmit the packet sooner thedealy on the dropped data is not as long, and because thereplacement data can arrive before the timeout of the followingpackets, they may not need to be re-sent. by configuring the AQMalgorithm to send the ECN notification to the sender only when thepacket is being dropped, the effect of failure of the ECN packetto get through to the sender (the notification packet runs intocongestion and gets dropped, some network device blocks it, etc)is that the ECN enabled case devolves to match the non-ECN case inthat the sender will still detect the dropped packet via thetimeout waiting for the ack as if ENCN was not enabled.<insert link to possible problems that can happen here, includingthe potential for an app to 'game' things if packets are marked ata different level than when they are dropped.>
So a very strong recommendation to enable Active Queue Management,while the different algorithms have different advantages andlevels of testing, even the 'worst' of the set results in anight-and-day improvement for usability compared to unmanaged buffers.Enabling ECN at the same point as dropping packets as part ofenabling any AQM algorithm results in a noticable improvement overthe base algorithm without ECN. When compared to the baseline, theimprovement added by ECN is tiny compared to the improvement from enabling AQM.
Is it fair to say that plain aqm vs aqm+ecn variation is on thesame order of difference as the differences between the differentAQM algorithms?
Future research items (which others here may already have done,and would not be part of my 'elevator pitch')I believe that currently ECn triggers the exact same slowdown thata missed packet does, and it may be appropriate to have the senderdo a less drastic slowdown.It would be very interesing to provide soem way for theapplication sending the traffic to detect dropped packets and ECNresponses. For example, a streaming media source (especially aninteractive one like video conferencing) could adjust the bitratethat it's sending.
David Lang
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm
________________________________________________________________
Bob Briscoe,                                                  BT
________________________________________________________________
Bob Briscoe, BT


_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

[aqm] Gaming ECN (again) (was: think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02)

Reply via email to