David,
At 22:46 13/04/2015, David Lang wrote:
On Mon, 13 Apr 2015, Bob Briscoe wrote:
David,
Returning from a fortnight offlist...
I think your conception of how ECN works is incorrect. You describe
ECN as if the AQM marks one packet when it drops another packet.
You say that the ECN-mark speeds up the retransmission of the
dropped packet. On the contrary, the idea of classic ECN [RFC3168]
is that the ECN marks replace the drops. In all known testing
(except pathological cases), classic ECN effectively eliminates
drops for all ECN-capable packets.
That's what I thought, and if that was the case, then marking
packets as ECN-capable would mean that they would have an advantage
over non-ECN packets (by not getting dropped, so getting a higher
share of bandwidth)
This is a common fallacy. An ECN-capable TCP achieves the same
throughput as an otherwise identical non-ECN TCP. The fallacy comes
from people who think that the network caps the rate by removing
packets. But the source determines the rate by how many drop or mark
signals it sees. For today's ('classic') ECN, the source's rate
reduction in response to either is the same; both voluntary as well.
There will be a tiny difference in goodput, because of the
retransmissions. However, the loss (or marking) probability that TCP
uses to determine its rate is a fraction of a percent, so this
difference is in the noise.
that's what the gaming ECN thread was about, and if I understood the
responses, I was being told that marking packets as ECN-capable, but
not slowing down (actually responding to ECN) would not let an
application get any advantage because the packets would just end up
getting dropped anyway, since marking and dropping happen at the
same level, even on ECN-capable flows.
Nope.
I recall that thread. People came up with a number of complicated
arguments for why it is hard to game ECN, but none were solid. Below
I have given a simple argument that I think is solid on its own. I
thought about intevening at the time, but this stuff needs care and
time that I didn't have then.
1) For the most simple and complete argument, all you need to know
about the ECN behaviour of AQMs is: under normal load conditions, an
AQM decides it's time to send a congestion signal irrespective of
whether the next packet is ECN-capable or not. Then, if the next
packet is ECN-capable, it marks it, else it drops it. This is from
RFC3168, which also requires the source to respond equally to either
a loss or a mark. I call this 'classic' ECN.{Note 1}
2) I will try to correct your misunderstanding about "marking and
dropping at the same level even on on ECN-capable flows". However, to
determine whether ECN can be gamed, there's no need to go there. So
I'll come back to that as a post-script{Note 2}.
3) I will prove that it is as easy to game loss as it is to game ECN,
first considering sender cheating, then receiver cheating:
3a) Sender Cheating
From the sender's point of view, the only difference between a loss
and an ECN mark is that it has to retransmit a loss. But that has
nothing to do with the rate it can go at. If it has been programmed
to ignore congestion feedback (and instead to go at a constant
unresponsive rate{Note 2}), it is as easy for it to ignore loss
feedback as ECN feedback. See {Note 3} for an example.
3b) Receiver Cheating
* An ECN receiver can best fool an ECN-capable TCP sender into going
faster by only feeding back a small fraction of ECN marks.{Note 4}
* A non-ECN receiver could fool a non-ECN TCP sender into going
faster by only revealing a small fraction of the losses. However, it
would have to ACK undelivered bytes, and most TCP-based apps won't
work unless all bytes are delivered.{Note 5}
So it seems that it's easier for a receiver to game ECN than loss. However:
* returning to the ECN case, the sender can validate the receiver by
randomly setting an ECN mark itself on a very small proportion of
packets (probably only on unusually high rate connections). Then if
it doesn't see ECN feedback on the ACK of any one of its
self-inserted marks, it can close the connection.
In summary,
* a sender can't game ECN any more easily than it can game loss.
* a receiver can only game ECN if the sender doesn't take measures to
prevent it.{Note 6}
If the packets are just marked, but not dropped, then the
ECN-capable flows will occupy a disproportinate share of the
available buffer space, since they just get marked instead of dropped.
Nope.
The arrival rates will be the same, whether or not ECN is used (see
earlier). And recall that TCP drives the marking or loss probability
at very small fractions in all normal conditions.
Example: if there are 10 flows in a 100Mb/s link, 5 ECN and 5
non-ECN, they will all arrive at the buffer at 10Mb/s (all other
factors being equal). Then, if the loss or marking probability is
0.5%, the AQM will be marking but not dropping 1 in 200 packets in
the ECN flows whereas it would drop 1 in 200 from the non-ECN flows.
So, assuming tail drop, if there were 399 packets in this buffer, on
average 200 would be ECN-capable (20 in each flow) with one marked;
and 199 would be non-ECN-capable (20 in each flow except one with
19). And one of those 199 would be a retransimssion from an earlier loss.
[Of course, we would hope that there would be 4 packets in the
buffer, not 400. The proportions would still be the same on average.
I merely used 399 to avoid fractions of packets for the averages.]
===Footnotes===
{Note 1} To be concrete, I've assumed classic ECN [RFC3168]. The
argument is similar for research approaches like "think once to mark,
twice to drop", but let's not make it more complicated than it needs to be.
{Note 2} RFC3168 (and draft-ietf-aqm-recommendation) require that,
whenever the AQM decides it is time to signal congestion on the next
packet, if the queue has been persistently long, the AQM must only
use drop as a congestion signal, irrespective of whether the next
packet is ECN-capable or not.
So, if a source naively just continues to increase its window until
it drives the queue into overload, then it would cause the AQM to
turn off ECN and consequently not be able to game ECN. But the simple
strategy of sending at a high but constant rate avoids driving the
queue into overload. So that's the strategy I described for gaming
ECN. Because strategies that don't work are irrelevant if there's a
strategy that does work.
{Note 3} Examples to show source cheating is as easy with loss as ECN:
* An ECN source sends at a constant unresponsive 90Mb/s through a
100Mb/s bottleneck. In parallel some other responsive flows (say 10
non-ECN TCP flows) squeeze themselves into the remaining 10Mb/s. They
will cause themselves (say) 0.5% loss probability, while the
unresponsive flow will experience 0.5% marking and zero loss.
* A non-ECN source can just as easily send unresponsively at 90.5Mb/s
as 90Mb/s. The other flows will still drive loss to about 0.5%, which
the unresponsive flow will now experience as well. Nonetheless, after
it retransmits the 0.5% loss it still achieves goodput of about 90Mb/s.
{Note 4} Again, feeding back no marks at all would be naive, because
it would drive the bottleneck into overload, causing it to turn off
ECN (and driving the loss-rate over a cliff). A better strategy is to
feedback only a small proportion. Because TCP's rate depends on the
square root of the congestion probability, to download N times
faster, the receiver should feed back only about 1 in N^2 of the
marks or losses. E.g. to go 90 times faster, feed back 1 in 8100
marks (or losses).
{Note 5} There are two classes of apps that use TCP but can get away
without reliable delivery:
i) Some streaming media apps are designed with a loss-tolerant
encoding, so they can use TCP but play out the media even if some
retransmissions haven't arrived yet (e.g. using a raw socket at the receiver).
ii) In the specific case of HTTP, a hacked receiver can open another
connection to the same server and download the byte-ranges it needs
to repair the holes in the other connection.
{Note 6} ConEx (congestion exposure [RFC6789]) provides a
comprehensive framework for the network to prevent senders and
receivers from cheating. However, for this argument, we don't need to
go there either.
Cheers
Bob
David Lang
[snip]
ECN has potential cheating problems, but we have per-customer queues anyway.
Using flow as the unit of allocation also has its own problems,
with no proposed solutions.
[snip]
Bob
At 05:16 30/03/2015, David Lang wrote:
[snip]
While AQM makes the network usable, there is still additional room
for improvement. While dropping packets does result in the TCP
senders slowing down,and eventually stabilizing at around the
right speed to keep the link fully utilized, the only way that
senders have been able to detect problems is to discover that they
have not received an ack for the traffic within the allowed time.
This causes a 'bubble' in the flow as teh dropped packet must be
retransmitted (and sometimes a significant amount of data after
the dropped packet that did make it to the destination, but could
not be acked because fo the missing packet).
This "bubble" in the data flow can be greatly compressed by
configuring the AQM algorithm to send an ECN packet to the sender
when it drops a packet in a flow. The sender can then adapt
faster, slowing down it's new data, and re-sending the dropped
packet without having to wait for the timeout. This has two major
effects by allowing the sender to retransmit the packet sooner the
dealy on the dropped data is not as long, and because the
replacement data can arrive before the timeout of the following
packets, they may not need to be re-sent. by configuring the AQM
algorithm to send the ECN notification to the sender only when the
packet is being dropped, the effect of failure of the ECN packet
to get through to the sender (the notification packet runs into
congestion and gets dropped, some network device blocks it, etc)
is that the ECN enabled case devolves to match the non-ECN case in
that the sender will still detect the dropped packet via the
timeout waiting for the ack as if ENCN was not enabled.
<insert link to possible problems that can happen here, including
the potential for an app to 'game' things if packets are marked at
a different level than when they are dropped.>
So a very strong recommendation to enable Active Queue Management,
while the different algorithms have different advantages and
levels of testing, even the 'worst' of the set results in a
night-and-day improvement for usability compared to unmanaged buffers.
Enabling ECN at the same point as dropping packets as part of
enabling any AQM algorithm results in a noticable improvement over
the base algorithm without ECN. When compared to the baseline, the
improvement added by ECN is tiny compared to the improvement from enabling AQM.
Is it fair to say that plain aqm vs aqm+ecn variation is on the
same order of difference as the differences between the different
AQM algorithms?
Future research items (which others here may already have done,
and would not be part of my 'elevator pitch')
I believe that currently ECn triggers the exact same slowdown that
a missed packet does, and it may be appropriate to have the sender
do a less drastic slowdown.
It would be very interesing to provide soem way for the
application sending the traffic to detect dropped packets and ECN
responses. For example, a streaming media source (especially an
interactive one like video conferencing) could adjust the bitrate
that it's sending.
David Lang
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm
________________________________________________________________
Bob Briscoe, BT
________________________________________________________________
Bob Briscoe, BT
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm