David,

At 22:46 13/04/2015, David Lang wrote:
On Mon, 13 Apr 2015, Bob Briscoe wrote:

David,

Returning from a fortnight offlist...

I think your conception of how ECN works is incorrect. You describe ECN as if the AQM marks one packet when it drops another packet. You say that the ECN-mark speeds up the retransmission of the dropped packet. On the contrary, the idea of classic ECN [RFC3168] is that the ECN marks replace the drops. In all known testing (except pathological cases), classic ECN effectively eliminates drops for all ECN-capable packets.

That's what I thought, and if that was the case, then marking packets as ECN-capable would mean that they would have an advantage over non-ECN packets (by not getting dropped, so getting a higher share of bandwidth)

This is a common fallacy. An ECN-capable TCP achieves the same throughput as an otherwise identical non-ECN TCP. The fallacy comes from people who think that the network caps the rate by removing packets. But the source determines the rate by how many drop or mark signals it sees. For today's ('classic') ECN, the source's rate reduction in response to either is the same; both voluntary as well.

There will be a tiny difference in goodput, because of the retransmissions. However, the loss (or marking) probability that TCP uses to determine its rate is a fraction of a percent, so this difference is in the noise.


that's what the gaming ECN thread was about, and if I understood the responses, I was being told that marking packets as ECN-capable, but not slowing down (actually responding to ECN) would not let an application get any advantage because the packets would just end up getting dropped anyway, since marking and dropping happen at the same level, even on ECN-capable flows.

Nope.

I recall that thread. People came up with a number of complicated arguments for why it is hard to game ECN, but none were solid. Below I have given a simple argument that I think is solid on its own. I thought about intevening at the time, but this stuff needs care and time that I didn't have then.

1) For the most simple and complete argument, all you need to know about the ECN behaviour of AQMs is: under normal load conditions, an AQM decides it's time to send a congestion signal irrespective of whether the next packet is ECN-capable or not. Then, if the next packet is ECN-capable, it marks it, else it drops it. This is from RFC3168, which also requires the source to respond equally to either a loss or a mark. I call this 'classic' ECN.{Note 1}

2) I will try to correct your misunderstanding about "marking and dropping at the same level even on on ECN-capable flows". However, to determine whether ECN can be gamed, there's no need to go there. So I'll come back to that as a post-script{Note 2}.

3) I will prove that it is as easy to game loss as it is to game ECN, first considering sender cheating, then receiver cheating:

3a) Sender Cheating
From the sender's point of view, the only difference between a loss and an ECN mark is that it has to retransmit a loss. But that has nothing to do with the rate it can go at. If it has been programmed to ignore congestion feedback (and instead to go at a constant unresponsive rate{Note 2}), it is as easy for it to ignore loss feedback as ECN feedback. See {Note 3} for an example.

3b) Receiver Cheating
* An ECN receiver can best fool an ECN-capable TCP sender into going faster by only feeding back a small fraction of ECN marks.{Note 4} * A non-ECN receiver could fool a non-ECN TCP sender into going faster by only revealing a small fraction of the losses. However, it would have to ACK undelivered bytes, and most TCP-based apps won't work unless all bytes are delivered.{Note 5}

So it seems that it's easier for a receiver to game ECN than loss. However:
* returning to the ECN case, the sender can validate the receiver by randomly setting an ECN mark itself on a very small proportion of packets (probably only on unusually high rate connections). Then if it doesn't see ECN feedback on the ACK of any one of its self-inserted marks, it can close the connection.

In summary,
* a sender can't game ECN any more easily than it can game loss.
* a receiver can only game ECN if the sender doesn't take measures to prevent it.{Note 6}

If the packets are just marked, but not dropped, then the ECN-capable flows will occupy a disproportinate share of the available buffer space, since they just get marked instead of dropped.

Nope.

The arrival rates will be the same, whether or not ECN is used (see earlier). And recall that TCP drives the marking or loss probability at very small fractions in all normal conditions.

Example: if there are 10 flows in a 100Mb/s link, 5 ECN and 5 non-ECN, they will all arrive at the buffer at 10Mb/s (all other factors being equal). Then, if the loss or marking probability is 0.5%, the AQM will be marking but not dropping 1 in 200 packets in the ECN flows whereas it would drop 1 in 200 from the non-ECN flows.

So, assuming tail drop, if there were 399 packets in this buffer, on average 200 would be ECN-capable (20 in each flow) with one marked; and 199 would be non-ECN-capable (20 in each flow except one with 19). And one of those 199 would be a retransimssion from an earlier loss.

[Of course, we would hope that there would be 4 packets in the buffer, not 400. The proportions would still be the same on average. I merely used 399 to avoid fractions of packets for the averages.]


===Footnotes===

{Note 1} To be concrete, I've assumed classic ECN [RFC3168]. The argument is similar for research approaches like "think once to mark, twice to drop", but let's not make it more complicated than it needs to be.

{Note 2} RFC3168 (and draft-ietf-aqm-recommendation) require that, whenever the AQM decides it is time to signal congestion on the next packet, if the queue has been persistently long, the AQM must only use drop as a congestion signal, irrespective of whether the next packet is ECN-capable or not.

So, if a source naively just continues to increase its window until it drives the queue into overload, then it would cause the AQM to turn off ECN and consequently not be able to game ECN. But the simple strategy of sending at a high but constant rate avoids driving the queue into overload. So that's the strategy I described for gaming ECN. Because strategies that don't work are irrelevant if there's a strategy that does work.

{Note 3} Examples to show source cheating is as easy with loss as ECN:
* An ECN source sends at a constant unresponsive 90Mb/s through a 100Mb/s bottleneck. In parallel some other responsive flows (say 10 non-ECN TCP flows) squeeze themselves into the remaining 10Mb/s. They will cause themselves (say) 0.5% loss probability, while the unresponsive flow will experience 0.5% marking and zero loss. * A non-ECN source can just as easily send unresponsively at 90.5Mb/s as 90Mb/s. The other flows will still drive loss to about 0.5%, which the unresponsive flow will now experience as well. Nonetheless, after it retransmits the 0.5% loss it still achieves goodput of about 90Mb/s.

{Note 4} Again, feeding back no marks at all would be naive, because it would drive the bottleneck into overload, causing it to turn off ECN (and driving the loss-rate over a cliff). A better strategy is to feedback only a small proportion. Because TCP's rate depends on the square root of the congestion probability, to download N times faster, the receiver should feed back only about 1 in N^2 of the marks or losses. E.g. to go 90 times faster, feed back 1 in 8100 marks (or losses).

{Note 5} There are two classes of apps that use TCP but can get away without reliable delivery: i) Some streaming media apps are designed with a loss-tolerant encoding, so they can use TCP but play out the media even if some retransmissions haven't arrived yet (e.g. using a raw socket at the receiver). ii) In the specific case of HTTP, a hacked receiver can open another connection to the same server and download the byte-ranges it needs to repair the holes in the other connection.

{Note 6} ConEx (congestion exposure [RFC6789]) provides a comprehensive framework for the network to prevent senders and receivers from cheating. However, for this argument, we don't need to go there either.

Cheers



Bob



David Lang

[snip]



ECN has potential cheating problems, but we have per-customer queues anyway.
Using flow as the unit of allocation also has its own problems, with no proposed solutions.

[snip]


Bob

At 05:16 30/03/2015, David Lang wrote:

[snip]

While AQM makes the network usable, there is still additional room for improvement. While dropping packets does result in the TCP senders slowing down,and eventually stabilizing at around the right speed to keep the link fully utilized, the only way that senders have been able to detect problems is to discover that they have not received an ack for the traffic within the allowed time. This causes a 'bubble' in the flow as teh dropped packet must be retransmitted (and sometimes a significant amount of data after the dropped packet that did make it to the destination, but could not be acked because fo the missing packet). This "bubble" in the data flow can be greatly compressed by configuring the AQM algorithm to send an ECN packet to the sender when it drops a packet in a flow. The sender can then adapt faster, slowing down it's new data, and re-sending the dropped packet without having to wait for the timeout. This has two major effects by allowing the sender to retransmit the packet sooner the dealy on the dropped data is not as long, and because the replacement data can arrive before the timeout of the following packets, they may not need to be re-sent. by configuring the AQM algorithm to send the ECN notification to the sender only when the packet is being dropped, the effect of failure of the ECN packet to get through to the sender (the notification packet runs into congestion and gets dropped, some network device blocks it, etc) is that the ECN enabled case devolves to match the non-ECN case in that the sender will still detect the dropped packet via the timeout waiting for the ack as if ENCN was not enabled. <insert link to possible problems that can happen here, including the potential for an app to 'game' things if packets are marked at a different level than when they are dropped.>

So a very strong recommendation to enable Active Queue Management, while the different algorithms have different advantages and levels of testing, even the 'worst' of the set results in a night-and-day improvement for usability compared to unmanaged buffers. Enabling ECN at the same point as dropping packets as part of enabling any AQM algorithm results in a noticable improvement over the base algorithm without ECN. When compared to the baseline, the improvement added by ECN is tiny compared to the improvement from enabling AQM.

Is it fair to say that plain aqm vs aqm+ecn variation is on the same order of difference as the differences between the different AQM algorithms?

Future research items (which others here may already have done, and would not be part of my 'elevator pitch') I believe that currently ECn triggers the exact same slowdown that a missed packet does, and it may be appropriate to have the sender do a less drastic slowdown. It would be very interesing to provide soem way for the application sending the traffic to detect dropped packets and ECN responses. For example, a streaming media source (especially an interactive one like video conferencing) could adjust the bitrate that it's sending.
David Lang
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

________________________________________________________________
Bob Briscoe,                                                  BT

________________________________________________________________
Bob Briscoe, BT

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to