Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02

David Lang Sun, 29 Mar 2015 21:17:04 -0700

On Sat, 28 Mar 2015, Scheffenegger, Richard wrote:

David,
Perhaps you would care to provide some text to address the misconception thatyou pointed out? (To wait for a 100% fix as a 90% fix appears much lessappealing, while the current state of art is at 0%)


Ok, you put me on the spot :-) Here goes.

If you think that aqm-recommendations is not strogly enough worded. I thinkthis particular discussion (to aqm or not) really belongs there. The otherdocument (ecn benefits) has a different target in arguing for going those last10%...

so here is my "elevator pitch" on the problem. Feel free to take anything I sayhere for any purpose, and I'm sure I'll get corrected for anything I am wong on

Problem statement: Transmit buffers are needed to keep the network layer fullyutilized, but excessive buffers result in poor latency for all traffic. Thislatency is frequently bad enough to cause some types of traffic to failentirely.

<link to more background goes here, including how separate benchmarks forthroughput and latency have mislead people, "packet loss considered evil",cheaper memory encouraging larger buffers, etc. Include tests likenetperf-wrapper and ping latency while under load, etc. Include examples wherebuffers have resulted in latencies so long that packets are retransmitted beforethe first copy gets to the destination>

Traditionally, transmit buffers have been sized to handle a fixed number ofpackets. Due to teh variation in packet sizes, it is impossible to tune thisvalue to both keep the link fully utilized when small packets dominate thetrafific without having the queue size be large enough to cause latency problemswhen large packets dominate the traffic.

Shifting to Byte Queue Lengths where queues are allowed to hold a variablenumber of packets depending on how large they are makes it possible to manuallytune the transmit buffer size to get good latency under all traffic conditionsat a given speed. However, this step forward revealed two additional problems.

1. whenever the data rate changes, this value needs to be manually changed(multi-link paths loose a link, noise degrades max throughput on a link, etc)

2. high volume flows (i.e. bulk downloads) can starve other flows (DNS lookups,VoIP, Gaming, etc). this happens because space in tue queue is on afirst-com-first-served basis, so the high-volume traffic fills the queue (atwhich point it starts to be dropped), but all other traffic that tries to arriveis also dropped. It turns out that these light flows tend to have a largereffect on the user experience than heavier flows, because things tend to beserialized behind the lighter flows (DNS lookup before doing a large download,retrieving a small HTML page to find what additional resources need to befetched to display a page), or the user experience is directly effected bylight flows (gaming lag, VoIP drops, etc)

Active Queue Management addresses these problems by adapting the amount of datathat is buffered to match the data transmission capacity, and prevents highvolume flows from starving low-volume flows without the need to implement QoSclassifications.

<insert link about how you can't trust QoS tags that are made by otherorganizations, ways that it can be abused, etc>

This is possible because AQM algoithms don't have to drop the new packet thatarrives, the algorithm can decide to drop the packet for one of the heavy flowsrather than for one of the lightweight flows.

<insert references to currently favored AQM options here, PIE, fq_codel, cake,???. Also links to failed approaches>

Turning on aqm on every bottleneck link makes the Internet usable for everyone,no matter what sort of application they are using.

<insert link on how to deal with equipment you can't configure by throttlingbandwidth before the bottleneck oand/or doing ingress shaping of traffic>

While AQM makes the network usable, there is still additional room forimprovement. While dropping packets does result in the TCP senders slowingdown,and eventually stabilizing at around the right speed to keep the link fullyutilized, the only way that senders have been able to detect problems is todiscover that they have not received an ack for the traffic within the allowedtime. This causes a 'bubble' in the flow as teh dropped packet must beretransmitted (and sometimes a significant amount of data after the droppedpacket that did make it to the destination, but could not be acked because fothe missing packet).

This "bubble" in the data flow can be greatly compressed by configuring the AQMalgorithm to send an ECN packet to the sender when it drops a packet in a flow.The sender can then adapt faster, slowing down it's new data, and re-sending thedropped packet without having to wait for the timeout. This has two majoreffects by allowing the sender to retransmit the packet sooner the dealy on thedropped data is not as long, and because the replacement data can arrive beforethe timeout of the following packets, they may not need to be re-sent. byconfiguring the AQM algorithm to send the ECN notification to the sender onlywhen the packet is being dropped, the effect of failure of the ECN packet to getthrough to the sender (the notification packet runs into congestion and getsdropped, some network device blocks it, etc) is that the ECN enabled casedevolves to match the non-ECN case in that the sender will still detect thedropped packet via the timeout waiting for the ack as if ENCN was not enabled.

<insert link to possible problems that can happen here, including the potentialfor an app to 'game' things if packets are marked at a different level thanwhen they are dropped.>

So a very strong recommendation to enable Active Queue Management, while thedifferent algorithms have different advantages and levels of testing, even the'worst' of the set results in a night-and-day improvement for usability comparedto unmanaged buffers.

Enabling ECN at the same point as dropping packets as part of enabling any AQMalgorithm results in a noticable improvement over the base algorithm withoutECN. When compared to the baseline, the improvement added by ECN is tinycompared to the improvement from enabling AQM.

Is it fair to say that plain aqm vs aqm+ecn variation is on the same order ofdifference as the differences between the different AQM algorithms?

Future research items (which others here may already have done, and would not bepart of my 'elevator pitch')

I believe that currently ECn triggers the exact same slowdown that a missedpacket does, and it may be appropriate to have the sender do a less drasticslowdown.

It would be very interesing to provide soem way for the application sending thetraffic to detect dropped packets and ECN responses. For example, a streamingmedia source (especially an interactive one like video conferencing) couldadjust the bitrate that it's sending.


David Lang

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] think once to mark, think twice to drop: draft-ietf-aqm-ecn-benefits-02

Reply via email to