oops, I did not mean to take this offlist. On Wed, Jun 10, 2015 at 4:25 PM, Dave Taht <[email protected]> wrote: > I care about this stuff, I strongly support the need for aqm on high > speed links also... > > but my own focus is mostly on shorter term stuff at much lower rates, > like wifi, and I am relying on others on the list to take apart your > analysis. I am sufficiently lazy to want a spreadsheet to check your > assumptions and then to go simulate what actually happens in my > testbeds. > > codel is a "safe" aqm. A big goal was to do no harm. > > Pie hits things harder, faster, when things get out of wack, but > tolerates a much higher delay. > > In "cake" we are looking at various problems, notably the stanford > solution for an "ideal" flows and bdp relationship, and packet peeling > for bulk offloads, and also hitting things harder when certain targets > are exceeded, and/or using ecn more wisely. > > http://www.bufferbloat.net/projects/codel/wiki/Cake > > Some of that work might migrate back to codel. An easy mod would be to > up the count while tail dropping, for example. > > And in all cases I don't think we will be applying single queue aqm > ideas this close to the "wire", some additional latency for other > processing is needed. > > Few doing codel believes in it, in it's current form, as a single > queue aqm. It was a "piece" of the solution to bufferbloat - > regulating tcp better at a wide range of rtts. > > a 5ms delay target (codel), or 16ms (pie) certainly seem like quite a > lot at 100gbit, and trying further to size the available buffering to > the number of flows seems like a good avenue for further research. I > would love to find someone that could run cake at 10GigE. > > On Wed, Jun 10, 2015 at 1:58 PM, Agarwal, Anil <[email protected]> > wrote: >> To follow up on Steve's posting - here is a simple (simplistic) analysis to >> estimate the time taken by CoDel >> to converge to the right value of the interval I. Perhaps, similar analysis >> has been done elsewhere. >> >> Analysis of Single Queue CoDEL for large number of TCP connections. >> >> Given: >> Link Capacity = C bps >> Number of active TCP connections = n >> Round Trip Time = rtt seconds >> Packet size = PS bytes >> CoDel initial interval = I0 seconds > > IO is a bad choice of variable name. I initially interpreted it as 10 > seconds, which cause a BP increase. :) > >> Assume: >> All TCP connections are long-lived. > > Not an assumption I would like to make. Do we have good numbers for > core routers? It certainly shifted since netflix. I note that a > network primarily in slow start behaves much worse to an aqm trying to > shoot at the right stuff. > >> At steady state, for every TCP conn, cwnd increases linearly from W/ 2 to W >> by one packet per RTT and then drops to W / 2 when a packet is dropped or >> ECN-marked >> >> Derive: >> bps_per_conn = C / n >> Conn max window size = W = bps_per_conn * rtt / PS (packets) >> Ideal drop interval per conn = DI1 = W / 2 * rtt (seconds) >> Ideal drop rate per conn = DR1 = 1 / DI1 (packets/sec) >> Aggregate Ideal drop rate = DR = DR1 * n (packets/sec) >> Ideal CoDel interval = II = 1 / DR (seconds) >> >> CoDel Packet drop rate increase per sec = DRI =~ 0.5 / I0^2 >> This can be derived from the derivative of the equation I(n) = I0 / >> sqrt(n), >> where I(n) is the interval size after n consecutive congested intervals. >> Under sustained congestion, the packet drop rate increases >> by DRI packets/second, every second. >> DRI does not depend on C, n or rtt ! >> >> CoDel time when Interval converges to II = T = DR / DRI (seconds) >> >> or >> CoDel time when Interval converges to II = T = 4 * PS / C * (n * I0 / rtt) >> ^ 2 (seconds) >> >> Comments >> The time T taken by CoDel to reach the correct I value is - >> - Inversely proportional to link Capacity C >> - Proportional to the square of the number of active TCP connections >> - Proportional to the square of the initial interval value >> - Inversely proportional to the square of rtt. >> >> If link capacity increases and the number of TCP connections increases >> correspondingly, then T will increase. >> >> During time 0 to T, the queue will likely remain full and packets will >> experience tail-drops. Of course one could argue that the number of >> connections n >> is likely to increase slowly, perhaps slower than the convergence time of >> I (?). >> >> In steady state, large sudden changes in n will likely result in >> tail-drops. > > In any state, large sudden changes in n will more likely result in > tail drops. Slow start with IW10 is nasty. Quic is doing 10 packet IWs > and 22 packets more, paced. > > However, large sudden changes happen infrequently. > >> This analysis is probably not applicable when rtt > I0 or if W is too >> small. >> >> Numerical Examples >> 1. C = 1 Gbps, n = 1000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms >> DRI = 50 packets/sec/sec >> W = 8.33 packets >> Drop Rate DR = 2400 packets/sec >> T = 48 seconds >> >> 2. C = 10 Gbps, n = 10000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms >> DRI = 50 packets/sec/sec >> W = 8.33 packets >> Drop Rate DR = 24000 packets/sec >> T = 480 seconds >> >> 3. C = 10 Gbps, n = 11000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms >> DRI = 50 packets/sec/sec >> W = 7.58 packets >> Drop Rate DR = 29040 packets/sec >> T = 580 seconds >> (10% increase in number of TCP connections caused T to increase by 100 >> seconds). > > These all assume all tcps are at full rate. I think. Goal in life is > to hit the fattest flows and get them to back off. Many tcps are rate > limited in other places than the core - well, the core is hopefully > overprovisioned, except when it was not (as per the paper I posted > earlier). Odds are you will hit the fatter flows with more likelyhood, > with a corresponding dropoff in rate. > > If your argument is that 10 minutes is too long to converge to a more > ideal rate from 0, I agree. 64 hardware queues coupled to 64 instances > of fq_codel with 1024 queues each may well ALSO be too many darn > queues - but it would hopefully converge to a more ideal rate much > faster with every change in n flows. > > That said we end up with another question, how much buffering is > enough buffering when you have that many flows? Which is an old > question, and I do wish it were possible to have a device on the core, > routinely saturated with real traffic, that we could be looking at as > development of realistic aqm and fq strategies continues. > >> >> This does seem to indicate that for core and edge routers that handle large >> number of connections, >> if the number of connections changes by large amounts , then the convergence >> of the CoDel Interval >> value will likely take a correspondingly large amount of time. >> >> This is probably true for other AQM schemes as well - is there any published >> analysis on the subject? > > Scaling AQM technologies to 100GigE and beyond would be a very nice > paper to have. :) > >> The example I have used may be a bit extreme, but I hope it illustrates the >> issue Steve is pointing to. >> >> Regards, >> Anil Agarwal >> ViaSat Inc. >> >> -----Original Message----- >> From: aqm [mailto:[email protected]] On Behalf Of Steven Blake >> Sent: Tuesday, June 09, 2015 4:40 PM >> To: Dave Taht >> Cc: [email protected] >> Subject: Re: [aqm] CoDel on high-speed links >> >> On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote: >> >>> The below makes several mis-characterisations of codel in the first >>> place, and then attempts to reason from there. >> >> Hmmm... >> >>> On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake <[email protected]> wrote: >>> > I have a question about how CoDel (as defined in >>> > draft-ietf-aqm-codel-01) behaves on high-speed (e.g., >= 1 Gbps) links. >>> > If this has been discussed before, please just point me in the right >>> > direction. >>> > >>> > In the text below, I'm using "drop" to mean either packet >>> > discard/ECN mark. I'm using (instantaneous) "drop frequency" to >>> > mean the inverse of the interval between consecutive "drops" during >>> > a congestion epoch, measured in drops/sec. >>> > >>> > The control law for CoDel computes the next time to drop a packet, >>> > and is given as: >>> > >>> > t + interval/sqrt(count) >>> > >>> > where t is the current time, interval is a value roughly >>> > proportional to maximum RTT (recommended 100 msec), and count is >>> > cumulative number of drops during a congestion epoch. >>> >>> No. Count is just a variable to control the curve of the drop rate. It >>> is not constantly incremented, either, it goes up and down based on >>> how successful it is at controlling the flow(s), only incrementing >>> while latency exceeds the target, decrementing slightly after it stays >>> below the target. The time spent below the target is not accounted >>> for, so you might have a high "bang-bang" drop rate retained, when >>> something goes above from below. >>> >>> This subtlety is something people consistently miss and something I >>> tried to elucidate in the first stanford talk. >> >> I specifically mentioned "during a congestion epoch", but let me be more >> precise: count is continuously incremented during an extended period where >> latency exceeds the target (perhaps because CoDel isn't yet dropping hard >> enough). Correct? >> >> The fact that the drop frequency doesn't ramp down quickly when congestion >> is momentarily relieved is good, but doesn't help if it takes forever for >> the algorithm to ramp up to an effective drop frequency (i.e., something >> greater than 1 drop/flow/minute). >> >>> >>> >It is not hard to see that drop >>> > frequency increases with sqrt(count). At the first drop, the >>> >frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after >>> >1000 drops it is 316 drops/sec. >>> > >>> > On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel >>> > immediately starts dropping 1% of packets and ramps up to ~10% after >>> > 100 drops (1.86 secs). >>> >>> No it will wait 100ms after stuff first exceeds the target, then >>> progressively shoot harder based on the progress of the >>> interval/sqrt(count). >> >> Ok. At the first drop it is dropping at a rate of 1 packet/100 msec == >> 10 drops/sec and ramps up from there. At the 100th drop it is dropping at a >> rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec. >> This just so happens to occur after 1.8 secs. >> >> Aside: as described, CoDel's drop frequency during a congestion epoch >> increases approximately linearly with time (at a rate of about 50 >> drops/sec^2 when interval = 100 msec). >> >>> secondly people have this tendency to measure full size packets, or a >>> 1k average packet. The reality is a dynamic range of 64 bytes to 64k >>> (gso/tso/gro offloads). So bytes is a far better proxy than packets in >>> order to think about this properly. >>> >>> offloads of various sorts bulking up packet sizes has been a headache. >>> I favor reducing mss on highly congested underbuffered links (and bob >>> favors sub-packet windows) to keep the "signal strength" up. >>> >>> The original definition of packet (circa 1962) was 1000 bits, with up >>> to 8 fragments. I do wish the materials that were the foundation of >>> packet behavior were online somewhere... >> >> I don't see how this has anything to do with the text of the draft or my >> questions. >> >>> > This seems like a reasonable range. On a 10 GE link serving 2.5 >>> > MPPs on average, CoDel would only drop 0.013% of packets after 1000 >>> > drops (which would occur after 6.18 secs). >>> >>> I am allergic to averages as a statistic in the network measurement case. >>> >>> > This doesn't seem >>> > to be very effective. It's possible to reduce interval to ramp up >>> > drop frequency more quickly, but that is counter-intuitive because >>> > interval should be roughly proportional to maximum RTT, which is >>> > link-speed independent. >>> >>> Except that tcp's drop their rates by (typically) half on a drop, and >>> a matter of debate as to when on CE. >> >> Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with >> interval = 100 msec starts dropping 257 packets/sec after 5 secs. >> How many flows is that effectively managing? >> >>> > Unless I am mistaken, it appears that the control law should be >>> > normalized in some way to average packet rate. On a high-speed >>> > link, it might be common to drop multiple packets per-msec, so it >>> > also isn't clear to me whether the drop frequency needs to be >>> > recalculated on every drop, or whether it could be recalculated over >>> > a shorter interval (e.g., >>> > 5 msec). >>> >>> Pie took the approach of sampling, setting a rate for shooting, over a >>> 16ms interval. That's pretty huge, but also low cost in some hardware. >>> >>> Codel's timestamp per-packet control law is continuous (but you do >>> need to have a cheap packet timestamping ability). >>> >>> Certainly in all cases more work is needed to address the problems >>> 100gps rates have in general, and it is not just all queue theory! A >>> small packet is .62 *ns* in that regime. A benefit of fq in this case >>> is that you can parallelize fib table lookups across multiple >>> processors/caches, and of fq_codel is that all codels operate >>> independently. >> >> High-speed metro/core links need AQM, too. I believe that >> draft-ietf-aqm-codel-01 doesn't work for these links in the general case >> (e.g., large #flows, recommended value for interval, no FQ). IMHO the draft >> should say something about this, or should adjust the algorithm accordingly. >> >> >> Regards, >> >> // Steve >> >> _______________________________________________ >> aqm mailing list >> [email protected] >> https://www.ietf.org/mailman/listinfo/aqm > > > > -- > Dave Täht > What will it take to vastly improve wifi for everyone? > https://plus.google.com/u/0/explore/makewififast
-- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast _______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
