Re: [aqm] CoDel on high-speed links

Dave Taht Wed, 10 Jun 2015 16:27:44 -0700

oops, I did not mean to take this offlist.

On Wed, Jun 10, 2015 at 4:25 PM, Dave Taht <[email protected]> wrote:
> I care about this stuff, I strongly support the need for aqm on high
> speed links also...
>
> but my own focus is mostly on shorter term stuff at much lower rates,
> like wifi, and I am relying on others on the list to take apart your
> analysis. I am sufficiently lazy to want a spreadsheet to check your
> assumptions and then to go simulate what actually happens in my
> testbeds.
>
> codel is a "safe" aqm. A big goal was to do no harm.
>
> Pie hits things harder, faster, when things get out of wack, but
> tolerates a much higher delay.
>
> In "cake" we are looking at various problems, notably the stanford
> solution for an "ideal" flows and bdp relationship, and packet peeling
> for bulk offloads, and also hitting things harder when certain targets
> are exceeded, and/or using ecn more wisely.
>
> http://www.bufferbloat.net/projects/codel/wiki/Cake
>
> Some of that work might migrate back to codel. An easy mod would be to
> up the count while tail dropping, for example.
>
> And in all cases I don't think we will be applying single queue aqm
> ideas this close to the "wire", some additional latency for other
> processing is needed.
>
> Few doing codel believes in it, in it's current form, as a single
> queue aqm. It was a "piece" of the solution to bufferbloat -
> regulating tcp better at a wide range of rtts.
>
> a 5ms delay target (codel), or 16ms (pie) certainly seem like quite a
> lot at 100gbit, and trying further to size the available buffering to
> the number of flows seems like a good avenue for further research. I
> would love to find someone that could run cake at 10GigE.
>
> On Wed, Jun 10, 2015 at 1:58 PM, Agarwal, Anil <[email protected]> 
> wrote:
>> To follow up on Steve's posting - here is a simple (simplistic) analysis to 
>> estimate the time taken by CoDel
>> to converge to the right value of the interval I. Perhaps, similar analysis 
>> has been done elsewhere.
>>
>> Analysis of Single Queue CoDEL for large number of TCP connections.
>>
>> Given:
>>   Link Capacity = C bps
>>   Number of active TCP connections = n
>>   Round Trip Time = rtt seconds
>>   Packet size = PS bytes
>>   CoDel initial interval = I0 seconds
>
> IO is a bad choice of variable name. I initially interpreted it as 10
> seconds, which cause a BP increase. :)
>
>> Assume:
>>   All TCP connections are long-lived.
>
> Not an assumption I would like to make. Do we have good numbers for
> core routers? It certainly shifted since netflix. I note that a
> network primarily in slow start behaves much worse to an aqm trying to
> shoot at the right stuff.
>
>>   At steady state, for every TCP conn, cwnd increases linearly from W/ 2 to W
>>   by one packet per RTT and then drops to W / 2 when a packet is dropped or
>>   ECN-marked
>>
>> Derive:
>>   bps_per_conn = C / n
>>   Conn max window size = W = bps_per_conn * rtt / PS (packets)
>>   Ideal drop interval per conn = DI1 = W / 2 * rtt (seconds)
>>   Ideal drop rate per conn = DR1 = 1 / DI1 (packets/sec)
>>   Aggregate Ideal drop rate = DR = DR1 * n (packets/sec)
>>   Ideal CoDel interval = II = 1 / DR (seconds)
>>
>>   CoDel Packet drop rate increase per sec = DRI =~ 0.5 / I0^2
>>     This can be derived from the derivative of the equation I(n) = I0 / 
>> sqrt(n),
>>     where I(n) is the interval size after n consecutive congested intervals.
>>     Under sustained congestion, the packet drop rate increases
>>     by DRI packets/second, every second.
>>     DRI does not depend on C, n or rtt !
>>
>>   CoDel time when Interval converges to II = T = DR / DRI (seconds)
>>
>>   or
>>   CoDel time when Interval converges to II = T = 4 * PS / C * (n * I0 / rtt) 
>> ^ 2 (seconds)
>>
>> Comments
>>   The time T taken by CoDel to reach the correct I value is -
>>     - Inversely proportional to link Capacity C
>>     - Proportional to the square of the number of active TCP connections
>>     - Proportional to the square of the initial interval value
>>     - Inversely proportional to the square of rtt.
>>
>>   If link capacity increases and the number of TCP connections increases
>>   correspondingly, then T will increase.
>>
>>   During time 0 to T, the queue will likely remain full and packets will
>>   experience tail-drops. Of course one could argue that the number of 
>> connections n
>>   is likely to increase slowly, perhaps slower than the convergence time of 
>> I (?).
>>
>>   In steady state, large sudden changes in n will likely result in 
>> tail-drops.
>
> In any state, large sudden changes in n will more likely result in
> tail drops. Slow start with IW10 is nasty. Quic is doing 10 packet IWs
> and 22 packets more, paced.
>
> However, large sudden changes happen infrequently.
>
>>   This analysis is probably not applicable when rtt > I0 or if W is too 
>> small.
>>
>> Numerical Examples
>>   1. C = 1 Gbps, n = 1000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>>      DRI = 50 packets/sec/sec
>>      W = 8.33 packets
>>      Drop Rate DR = 2400 packets/sec
>>      T = 48 seconds
>>
>>   2. C = 10 Gbps, n = 10000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>>      DRI = 50 packets/sec/sec
>>      W = 8.33 packets
>>      Drop Rate DR = 24000 packets/sec
>>      T = 480 seconds
>>
>> 3. C = 10 Gbps, n = 11000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms
>>      DRI = 50 packets/sec/sec
>>      W = 7.58 packets
>>      Drop Rate DR = 29040 packets/sec
>>      T = 580 seconds
>>      (10% increase in number of TCP connections caused T to increase by 100 
>> seconds).
>
> These all assume all tcps are at full rate. I think. Goal in life is
> to hit the fattest flows and get them to back off. Many tcps are rate
> limited in other places than the core - well, the core is hopefully
> overprovisioned, except when it was not (as per the paper I posted
> earlier). Odds are you will hit the fatter flows with more likelyhood,
> with a corresponding dropoff in rate.
>
> If your argument is that 10 minutes is too long to converge to a more
> ideal rate from 0, I agree. 64 hardware queues coupled to 64 instances
> of fq_codel with 1024 queues each may well ALSO be too many darn
> queues - but it would hopefully converge to a more ideal rate much
> faster with every change in n flows.
>
> That said we end up with another question, how much buffering is
> enough buffering when you have that many flows? Which is an old
> question, and I do wish it were possible to have a device on the core,
> routinely saturated with real traffic, that we could be looking at as
> development of realistic aqm and fq strategies continues.
>
>>
>> This does seem to indicate that for core and edge routers that handle large 
>> number of connections,
>> if the number of connections changes by large amounts , then the convergence 
>> of the CoDel Interval
>> value will likely take a correspondingly large amount of time.
>>
>> This is probably true for other AQM schemes as well - is there any published 
>> analysis on the subject?
>
> Scaling AQM technologies to 100GigE and beyond would be a very nice
> paper to have. :)
>
>> The example I have used may be a bit extreme, but I hope it illustrates the 
>> issue Steve is pointing to.
>>
>> Regards,
>> Anil Agarwal
>> ViaSat Inc.
>>
>> -----Original Message-----
>> From: aqm [mailto:[email protected]] On Behalf Of Steven Blake
>> Sent: Tuesday, June 09, 2015 4:40 PM
>> To: Dave Taht
>> Cc: [email protected]
>> Subject: Re: [aqm] CoDel on high-speed links
>>
>> On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote:
>>
>>> The below makes several mis-characterisations of codel in the first
>>> place, and then attempts to reason from there.
>>
>> Hmmm...
>>
>>> On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake <[email protected]> wrote:
>>> > I have a question about how CoDel (as defined in
>>> > draft-ietf-aqm-codel-01) behaves on high-speed (e.g., >= 1 Gbps) links.
>>> > If this has been discussed before, please just point me in the right
>>> > direction.
>>> >
>>> > In the text below, I'm using "drop" to mean either packet
>>> > discard/ECN mark.  I'm using (instantaneous) "drop frequency" to
>>> > mean the inverse of the interval between consecutive "drops" during
>>> > a congestion epoch, measured in drops/sec.
>>> >
>>> > The control law for CoDel computes the next time to drop a packet,
>>> > and is given as:
>>> >
>>> >   t + interval/sqrt(count)
>>> >
>>> > where t is the current time, interval is a value roughly
>>> > proportional to maximum RTT (recommended 100 msec), and count is
>>> > cumulative number of drops during a congestion epoch.
>>>
>>> No. Count is just a variable to control the curve of the drop rate. It
>>> is not constantly incremented, either, it goes up and down based on
>>> how successful it is at controlling the flow(s), only incrementing
>>> while latency exceeds the target, decrementing slightly after it stays
>>> below the target. The time spent below the target is not accounted
>>> for, so you might have a high "bang-bang" drop rate retained, when
>>> something goes above from below.
>>>
>>> This subtlety is something people consistently miss and something I
>>> tried to elucidate in the first stanford talk.
>>
>> I specifically mentioned "during a congestion epoch", but let me be more
>> precise: count is continuously incremented during an extended period where 
>> latency exceeds the target (perhaps because CoDel isn't yet dropping hard 
>> enough). Correct?
>>
>> The fact that the drop frequency doesn't ramp down quickly when congestion 
>> is momentarily relieved is good, but doesn't help if it takes forever for 
>> the algorithm to ramp up to an effective drop frequency (i.e., something 
>> greater than 1 drop/flow/minute).
>>
>>>
>>> >It is not hard to see that drop
>>> > frequency increases with sqrt(count).  At the first drop, the
>>> >frequency  is 10 drop/sec; after 100 drops it is 100 drops/sec; after
>>> >1000 drops it  is 316 drops/sec.
>>> >
>>> > On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel
>>> > immediately starts dropping 1% of packets and ramps up to ~10% after
>>> > 100 drops (1.86 secs).
>>>
>>> No it will wait 100ms after stuff first exceeds the target, then
>>> progressively shoot harder based on the progress of the
>>> interval/sqrt(count).
>>
>> Ok.  At the first drop it is dropping at a rate of 1 packet/100 msec ==
>> 10 drops/sec and ramps up from there.  At the 100th drop it is dropping at a 
>> rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec.
>> This just so happens to occur after 1.8 secs.
>>
>> Aside: as described, CoDel's drop frequency during a congestion epoch 
>> increases approximately linearly with time (at a rate of about 50
>> drops/sec^2 when interval = 100 msec).
>>
>>> secondly people have this tendency to measure full size packets, or a
>>> 1k average packet. The reality is a dynamic range of 64 bytes to 64k
>>> (gso/tso/gro offloads). So bytes is a far better proxy than packets in
>>> order to think about this properly.
>>>
>>> offloads of various sorts bulking up packet sizes has been a headache.
>>> I favor reducing mss on highly congested underbuffered links (and bob
>>> favors sub-packet windows) to keep the "signal strength" up.
>>>
>>> The original definition of packet (circa 1962) was 1000 bits, with up
>>> to 8 fragments. I do wish the materials that were the foundation of
>>> packet behavior were online somewhere...
>>
>> I don't see how this has anything to do with the text of the draft or my 
>> questions.
>>
>>> >  This seems like a reasonable range.  On a 10 GE link serving 2.5
>>> > MPPs on average, CoDel would only drop 0.013% of packets after 1000
>>> > drops (which would occur after 6.18 secs).
>>>
>>> I am allergic to averages as a statistic in the network measurement case.
>>>
>>> > This doesn't seem
>>> > to be very effective.  It's possible to reduce interval to ramp up
>>> > drop frequency more quickly, but that is counter-intuitive because
>>> > interval should be roughly proportional to maximum RTT, which is
>>> > link-speed independent.
>>>
>>> Except that tcp's drop their rates by (typically) half on a drop, and
>>> a matter of debate as to when on CE.
>>
>> Ex/ 10 GE link, ~10K flows (average).  During a congestion epoch, CoDel with 
>> interval = 100 msec starts dropping 257 packets/sec after 5 secs.
>> How many flows is that effectively managing?
>>
>>> > Unless I am mistaken, it appears that the control law should be
>>> > normalized in some way to average packet rate.  On a high-speed
>>> > link, it might be common to drop multiple packets per-msec, so it
>>> > also isn't clear to me whether the drop frequency needs to be
>>> > recalculated on every drop, or whether it could be recalculated over
>>> > a shorter interval (e.g.,
>>> > 5 msec).
>>>
>>> Pie took the approach of sampling, setting a rate for shooting, over a
>>> 16ms interval. That's pretty huge, but also low cost in some hardware.
>>>
>>> Codel's timestamp per-packet control law is continuous (but you do
>>> need to have a cheap packet timestamping ability).
>>>
>>> Certainly in all cases more work is needed to address the problems
>>> 100gps rates have in general, and it is not just all queue theory! A
>>> small packet is .62 *ns* in that regime. A benefit of fq in this case
>>> is that you can parallelize fib table lookups across multiple
>>> processors/caches, and of fq_codel is that all codels operate
>>> independently.
>>
>> High-speed metro/core links need AQM, too.  I believe that
>> draft-ietf-aqm-codel-01 doesn't work for these links in the general case 
>> (e.g., large #flows, recommended value for interval, no FQ).  IMHO the draft 
>> should say something about this, or should adjust the algorithm accordingly.
>>
>>
>> Regards,
>>
>> // Steve
>>
>> _______________________________________________
>> aqm mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/aqm
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast




-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] CoDel on high-speed links

Reply via email to