On Sat, 28 Mar 2015, Scheffenegger, Richard wrote:
David,
Perhaps you would care to provide some text to address the misconception that
you pointed out? (To wait for a 100% fix as a 90% fix appears much less
appealing, while the current state of art is at 0%)
Ok, you put me on the spot :-) Here goes.
If you think that aqm-recommendations is not strogly enough worded. I think
this particular discussion (to aqm or not) really belongs there. The other
document (ecn benefits) has a different target in arguing for going those last
10%...
so here is my "elevator pitch" on the problem. Feel free to take anything I say
here for any purpose, and I'm sure I'll get corrected for anything I am wong on
Problem statement: Transmit buffers are needed to keep the network layer fully
utilized, but excessive buffers result in poor latency for all traffic. This
latency is frequently bad enough to cause some types of traffic to fail
entirely.
<link to more background goes here, including how separate benchmarks for
throughput and latency have mislead people, "packet loss considered evil",
cheaper memory encouraging larger buffers, etc. Include tests like
netperf-wrapper and ping latency while under load, etc. Include examples where
buffers have resulted in latencies so long that packets are retransmitted before
the first copy gets to the destination>
Traditionally, transmit buffers have been sized to handle a fixed number of
packets. Due to teh variation in packet sizes, it is impossible to tune this
value to both keep the link fully utilized when small packets dominate the
trafific without having the queue size be large enough to cause latency problems
when large packets dominate the traffic.
Shifting to Byte Queue Lengths where queues are allowed to hold a variable
number of packets depending on how large they are makes it possible to manually
tune the transmit buffer size to get good latency under all traffic conditions
at a given speed. However, this step forward revealed two additional problems.
1. whenever the data rate changes, this value needs to be manually changed
(multi-link paths loose a link, noise degrades max throughput on a link, etc)
2. high volume flows (i.e. bulk downloads) can starve other flows (DNS lookups,
VoIP, Gaming, etc). this happens because space in tue queue is on a
first-com-first-served basis, so the high-volume traffic fills the queue (at
which point it starts to be dropped), but all other traffic that tries to arrive
is also dropped. It turns out that these light flows tend to have a larger
effect on the user experience than heavier flows, because things tend to be
serialized behind the lighter flows (DNS lookup before doing a large download,
retrieving a small HTML page to find what additional resources need to be
fetched to display a page), or the user experience is directly effected by
light flows (gaming lag, VoIP drops, etc)
Active Queue Management addresses these problems by adapting the amount of data
that is buffered to match the data transmission capacity, and prevents high
volume flows from starving low-volume flows without the need to implement QoS
classifications.
<insert link about how you can't trust QoS tags that are made by other
organizations, ways that it can be abused, etc>
This is possible because AQM algoithms don't have to drop the new packet that
arrives, the algorithm can decide to drop the packet for one of the heavy flows
rather than for one of the lightweight flows.
<insert references to currently favored AQM options here, PIE, fq_codel, cake,
???. Also links to failed approaches>
Turning on aqm on every bottleneck link makes the Internet usable for everyone,
no matter what sort of application they are using.
<insert link on how to deal with equipment you can't configure by throttling
bandwidth before the bottleneck oand/or doing ingress shaping of traffic>
While AQM makes the network usable, there is still additional room for
improvement. While dropping packets does result in the TCP senders slowing
down,and eventually stabilizing at around the right speed to keep the link fully
utilized, the only way that senders have been able to detect problems is to
discover that they have not received an ack for the traffic within the allowed
time. This causes a 'bubble' in the flow as teh dropped packet must be
retransmitted (and sometimes a significant amount of data after the dropped
packet that did make it to the destination, but could not be acked because fo
the missing packet).
This "bubble" in the data flow can be greatly compressed by configuring the AQM
algorithm to send an ECN packet to the sender when it drops a packet in a flow.
The sender can then adapt faster, slowing down it's new data, and re-sending the
dropped packet without having to wait for the timeout. This has two major
effects by allowing the sender to retransmit the packet sooner the dealy on the
dropped data is not as long, and because the replacement data can arrive before
the timeout of the following packets, they may not need to be re-sent. by
configuring the AQM algorithm to send the ECN notification to the sender only
when the packet is being dropped, the effect of failure of the ECN packet to get
through to the sender (the notification packet runs into congestion and gets
dropped, some network device blocks it, etc) is that the ECN enabled case
devolves to match the non-ECN case in that the sender will still detect the
dropped packet via the timeout waiting for the ack as if ENCN was not enabled.
<insert link to possible problems that can happen here, including the potential
for an app to 'game' things if packets are marked at a different level than
when they are dropped.>
So a very strong recommendation to enable Active Queue Management, while the
different algorithms have different advantages and levels of testing, even the
'worst' of the set results in a night-and-day improvement for usability compared
to unmanaged buffers.
Enabling ECN at the same point as dropping packets as part of enabling any AQM
algorithm results in a noticable improvement over the base algorithm without
ECN. When compared to the baseline, the improvement added by ECN is tiny
compared to the improvement from enabling AQM.
Is it fair to say that plain aqm vs aqm+ecn variation is on the same order of
difference as the differences between the different AQM algorithms?
Future research items (which others here may already have done, and would not be
part of my 'elevator pitch')
I believe that currently ECn triggers the exact same slowdown that a missed
packet does, and it may be appropriate to have the sender do a less drastic
slowdown.
It would be very interesing to provide soem way for the application sending the
traffic to detect dropped packets and ECN responses. For example, a streaming
media source (especially an interactive one like video conferencing) could
adjust the bitrate that it's sending.
David Lang
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm