Re: [aqm] CoDel on high-speed links

Dave Taht Wed, 10 Jun 2015 14:57:08 -0700

I certainly would like you to try factoring in fq_codel's effects into
your calculations. 1024 queues is the default,
with 64k queues possible. If you assume 10k flows, you can readily
assume a pretty equal distribution across those available queues, with
1k codels compensating madly. I tend to think most of the future
faster world is going to have multiple queues, with every 10gigE
device I am familiar with coming with 64 or more.


Secondly you appear to be thinking in terms of very abrupt changes in
demand not characteristic of a core link. (certainly characteristic of
a route flap, however, so I am not dissing your work).

My favorite paper of last year was this one, where changes in demand
had timescales of hours, not seconds, and I would have loved to have
been in a position to try out every new aqm and fq+aqm algorithm on a
core-like router (many, many 10gigE ports is totally feasible on x86),
against the workloads here:

http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf

Lastly, spreadsheets are more easy to deal with than emails, if you
want to put one up with various
algos compared, that would be a goodness!?

The "all tcp sessions are long lived" is a truly fatal error in
thinking about how this stuff works, IMHO, but I realize that that is
hard to reason about.


On Wed, Jun 10, 2015 at 1:58 PM, Agarwal, Anil <[email protected]> wrote:
> To follow up on Steve's posting - here is a simple (simplistic) analysis to 
> estimate the time taken by CoDel
> to converge to the right value of the interval I. Perhaps, similar analysis 
> has been done elsewhere.
>
> Analysis of Single Queue CoDEL for large number of TCP connections.
>
> Given:
>   Link Capacity = C bps
>   Number of active TCP connections = n
>   Round Trip Time = rtt seconds
>   Packet size = PS bytes
>   CoDel initial interval = I0 seconds
>
> Assume:
>   All TCP connections are long-lived.
>
>   At steady state, for every TCP conn, cwnd increases linearly from W/ 2 to W
>   by one packet per RTT and then drops to W / 2 when a packet is dropped or
>   ECN-marked
>
> Derive:
>   bps_per_conn = C / n
>   Conn max window size = W = bps_per_conn * rtt / PS (packets)
>   Ideal drop interval per conn = DI1 = W / 2 * rtt (seconds)
>   Ideal drop rate per conn = DR1 = 1 / DI1 (packets/sec)
>   Aggregate Ideal drop rate = DR = DR1 * n (packets/sec)
>   Ideal CoDel interval = II = 1 / DR (seconds)
>
>   CoDel Packet drop rate increase per sec = DRI =~ 0.5 / I0^2
>     This can be derived from the derivative of the equation I(n) = I0 / 
> sqrt(n),
>     where I(n) is the interval size after n consecutive congested intervals.
>     Under sustained congestion, the packet drop rate increases
>     by DRI packets/second, every second.
>     DRI does not depend on C, n or rtt !
>
>   CoDel time when Interval converges to II = T = DR / DRI (seconds)
>
>   or
>   CoDel time when Interval converges to II = T = 4 * PS / C * (n * I0 / rtt) 
> ^ 2 (seconds)
>
> Comments
>   The time T taken by CoDel to reach the correct I value is -
>     - Inversely proportional to link Capacity C
>     - Proportional to the square of the number of active TCP connections
>     - Proportional to the square of the initial interval value
>     - Inversely proportional to the square of rtt.
>
>   If link capacity increases and the number of TCP connections increases
>   correspondingly, then T will increase.
>
>   During time 0 to T, the queue will likely remain full and packets will
>   experience tail-drops. Of course one could argue that the number of 
> connections n
>   is likely to increase slowly, perhaps slower than the convergence time of I 
> (?).
>
>   In steady state, large sudden changes in n will likely result in tail-drops.
>
>   This analysis is probably not applicable when rtt > I0 or if W is too small.
>
> Numerical Examples
>   1. C = 1 Gbps, n = 1000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>      DRI = 50 packets/sec/sec
>      W = 8.33 packets
>      Drop Rate DR = 2400 packets/sec
>      T = 48 seconds
>
>   2. C = 10 Gbps, n = 10000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>      DRI = 50 packets/sec/sec
>      W = 8.33 packets
>      Drop Rate DR = 24000 packets/sec
>      T = 480 seconds
>
> 3. C = 10 Gbps, n = 11000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>      DRI = 50 packets/sec/sec
>      W = 7.58 packets
>      Drop Rate DR = 29040 packets/sec
>      T = 580 seconds
>      (10% increase in number of TCP connections caused T to increase by 100 
> seconds).
>
> This does seem to indicate that for core and edge routers that handle large 
> number of connections,
> if the number of connections changes by large amounts , then the convergence 
> of the CoDel Interval
> value will likely take a correspondingly large amount of time.
>
> This is probably true for other AQM schemes as well - is there any published 
> analysis on the subject?
>
> The example I have used may be a bit extreme, but I hope it illustrates the 
> issue Steve is pointing to.
>
> Regards,
> Anil Agarwal
> ViaSat Inc.
>
> -----Original Message-----
> From: aqm [mailto:[email protected]] On Behalf Of Steven Blake
> Sent: Tuesday, June 09, 2015 4:40 PM
> To: Dave Taht
> Cc: [email protected]
> Subject: Re: [aqm] CoDel on high-speed links
>
> On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote:
>
>> The below makes several mis-characterisations of codel in the first
>> place, and then attempts to reason from there.
>
> Hmmm...
>
>> On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake <[email protected]> wrote:
>> > I have a question about how CoDel (as defined in
>> > draft-ietf-aqm-codel-01) behaves on high-speed (e.g., >= 1 Gbps) links.
>> > If this has been discussed before, please just point me in the right
>> > direction.
>> >
>> > In the text below, I'm using "drop" to mean either packet
>> > discard/ECN mark.  I'm using (instantaneous) "drop frequency" to
>> > mean the inverse of the interval between consecutive "drops" during
>> > a congestion epoch, measured in drops/sec.
>> >
>> > The control law for CoDel computes the next time to drop a packet,
>> > and is given as:
>> >
>> >   t + interval/sqrt(count)
>> >
>> > where t is the current time, interval is a value roughly
>> > proportional to maximum RTT (recommended 100 msec), and count is
>> > cumulative number of drops during a congestion epoch.
>>
>> No. Count is just a variable to control the curve of the drop rate. It
>> is not constantly incremented, either, it goes up and down based on
>> how successful it is at controlling the flow(s), only incrementing
>> while latency exceeds the target, decrementing slightly after it stays
>> below the target. The time spent below the target is not accounted
>> for, so you might have a high "bang-bang" drop rate retained, when
>> something goes above from below.
>>
>> This subtlety is something people consistently miss and something I
>> tried to elucidate in the first stanford talk.
>
> I specifically mentioned "during a congestion epoch", but let me be more
> precise: count is continuously incremented during an extended period where 
> latency exceeds the target (perhaps because CoDel isn't yet dropping hard 
> enough). Correct?
>
> The fact that the drop frequency doesn't ramp down quickly when congestion is 
> momentarily relieved is good, but doesn't help if it takes forever for the 
> algorithm to ramp up to an effective drop frequency (i.e., something greater 
> than 1 drop/flow/minute).
>
>>
>> >It is not hard to see that drop
>> > frequency increases with sqrt(count).  At the first drop, the
>> >frequency  is 10 drop/sec; after 100 drops it is 100 drops/sec; after
>> >1000 drops it  is 316 drops/sec.
>> >
>> > On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel
>> > immediately starts dropping 1% of packets and ramps up to ~10% after
>> > 100 drops (1.86 secs).
>>
>> No it will wait 100ms after stuff first exceeds the target, then
>> progressively shoot harder based on the progress of the
>> interval/sqrt(count).
>
> Ok.  At the first drop it is dropping at a rate of 1 packet/100 msec ==
> 10 drops/sec and ramps up from there.  At the 100th drop it is dropping at a 
> rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec.
> This just so happens to occur after 1.8 secs.
>
> Aside: as described, CoDel's drop frequency during a congestion epoch 
> increases approximately linearly with time (at a rate of about 50
> drops/sec^2 when interval = 100 msec).
>
>> secondly people have this tendency to measure full size packets, or a
>> 1k average packet. The reality is a dynamic range of 64 bytes to 64k
>> (gso/tso/gro offloads). So bytes is a far better proxy than packets in
>> order to think about this properly.
>>
>> offloads of various sorts bulking up packet sizes has been a headache.
>> I favor reducing mss on highly congested underbuffered links (and bob
>> favors sub-packet windows) to keep the "signal strength" up.
>>
>> The original definition of packet (circa 1962) was 1000 bits, with up
>> to 8 fragments. I do wish the materials that were the foundation of
>> packet behavior were online somewhere...
>
> I don't see how this has anything to do with the text of the draft or my 
> questions.
>
>> >  This seems like a reasonable range.  On a 10 GE link serving 2.5
>> > MPPs on average, CoDel would only drop 0.013% of packets after 1000
>> > drops (which would occur after 6.18 secs).
>>
>> I am allergic to averages as a statistic in the network measurement case.
>>
>> > This doesn't seem
>> > to be very effective.  It's possible to reduce interval to ramp up
>> > drop frequency more quickly, but that is counter-intuitive because
>> > interval should be roughly proportional to maximum RTT, which is
>> > link-speed independent.
>>
>> Except that tcp's drop their rates by (typically) half on a drop, and
>> a matter of debate as to when on CE.
>
> Ex/ 10 GE link, ~10K flows (average).  During a congestion epoch, CoDel with 
> interval = 100 msec starts dropping 257 packets/sec after 5 secs.
> How many flows is that effectively managing?
>
>> > Unless I am mistaken, it appears that the control law should be
>> > normalized in some way to average packet rate.  On a high-speed
>> > link, it might be common to drop multiple packets per-msec, so it
>> > also isn't clear to me whether the drop frequency needs to be
>> > recalculated on every drop, or whether it could be recalculated over
>> > a shorter interval (e.g.,
>> > 5 msec).
>>
>> Pie took the approach of sampling, setting a rate for shooting, over a
>> 16ms interval. That's pretty huge, but also low cost in some hardware.
>>
>> Codel's timestamp per-packet control law is continuous (but you do
>> need to have a cheap packet timestamping ability).
>>
>> Certainly in all cases more work is needed to address the problems
>> 100gps rates have in general, and it is not just all queue theory! A
>> small packet is .62 *ns* in that regime. A benefit of fq in this case
>> is that you can parallelize fib table lookups across multiple
>> processors/caches, and of fq_codel is that all codels operate
>> independently.
>
> High-speed metro/core links need AQM, too.  I believe that
> draft-ietf-aqm-codel-01 doesn't work for these links in the general case 
> (e.g., large #flows, recommended value for interval, no FQ).  IMHO the draft 
> should say something about this, or should adjust the algorithm accordingly.
>
>
> Regards,
>
> // Steve
>
> _______________________________________________
> aqm mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/aqm



-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] CoDel on high-speed links

Reply via email to