I certainly would like you to try factoring in fq_codel's effects into your calculations. 1024 queues is the default, with 64k queues possible. If you assume 10k flows, you can readily assume a pretty equal distribution across those available queues, with 1k codels compensating madly. I tend to think most of the future faster world is going to have multiple queues, with every 10gigE device I am familiar with coming with 64 or more.
Secondly you appear to be thinking in terms of very abrupt changes in demand not characteristic of a core link. (certainly characteristic of a route flap, however, so I am not dissing your work). My favorite paper of last year was this one, where changes in demand had timescales of hours, not seconds, and I would have loved to have been in a position to try out every new aqm and fq+aqm algorithm on a core-like router (many, many 10gigE ports is totally feasible on x86), against the workloads here: http://www.caida.org/publications/papers/2014/measurement_analysis_internet_interconnection/measurement_analysis_internet_interconnection.pdf Lastly, spreadsheets are more easy to deal with than emails, if you want to put one up with various algos compared, that would be a goodness!? The "all tcp sessions are long lived" is a truly fatal error in thinking about how this stuff works, IMHO, but I realize that that is hard to reason about. On Wed, Jun 10, 2015 at 1:58 PM, Agarwal, Anil <[email protected]> wrote: > To follow up on Steve's posting - here is a simple (simplistic) analysis to > estimate the time taken by CoDel > to converge to the right value of the interval I. Perhaps, similar analysis > has been done elsewhere. > > Analysis of Single Queue CoDEL for large number of TCP connections. > > Given: > Link Capacity = C bps > Number of active TCP connections = n > Round Trip Time = rtt seconds > Packet size = PS bytes > CoDel initial interval = I0 seconds > > Assume: > All TCP connections are long-lived. > > At steady state, for every TCP conn, cwnd increases linearly from W/ 2 to W > by one packet per RTT and then drops to W / 2 when a packet is dropped or > ECN-marked > > Derive: > bps_per_conn = C / n > Conn max window size = W = bps_per_conn * rtt / PS (packets) > Ideal drop interval per conn = DI1 = W / 2 * rtt (seconds) > Ideal drop rate per conn = DR1 = 1 / DI1 (packets/sec) > Aggregate Ideal drop rate = DR = DR1 * n (packets/sec) > Ideal CoDel interval = II = 1 / DR (seconds) > > CoDel Packet drop rate increase per sec = DRI =~ 0.5 / I0^2 > This can be derived from the derivative of the equation I(n) = I0 / > sqrt(n), > where I(n) is the interval size after n consecutive congested intervals. > Under sustained congestion, the packet drop rate increases > by DRI packets/second, every second. > DRI does not depend on C, n or rtt ! > > CoDel time when Interval converges to II = T = DR / DRI (seconds) > > or > CoDel time when Interval converges to II = T = 4 * PS / C * (n * I0 / rtt) > ^ 2 (seconds) > > Comments > The time T taken by CoDel to reach the correct I value is - > - Inversely proportional to link Capacity C > - Proportional to the square of the number of active TCP connections > - Proportional to the square of the initial interval value > - Inversely proportional to the square of rtt. > > If link capacity increases and the number of TCP connections increases > correspondingly, then T will increase. > > During time 0 to T, the queue will likely remain full and packets will > experience tail-drops. Of course one could argue that the number of > connections n > is likely to increase slowly, perhaps slower than the convergence time of I > (?). > > In steady state, large sudden changes in n will likely result in tail-drops. > > This analysis is probably not applicable when rtt > I0 or if W is too small. > > Numerical Examples > 1. C = 1 Gbps, n = 1000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms > DRI = 50 packets/sec/sec > W = 8.33 packets > Drop Rate DR = 2400 packets/sec > T = 48 seconds > > 2. C = 10 Gbps, n = 10000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms > DRI = 50 packets/sec/sec > W = 8.33 packets > Drop Rate DR = 24000 packets/sec > T = 480 seconds > > 3. C = 10 Gbps, n = 11000, rtt = 100 ms, PS = 1500 bytes, I0 = 100 ms > DRI = 50 packets/sec/sec > W = 7.58 packets > Drop Rate DR = 29040 packets/sec > T = 580 seconds > (10% increase in number of TCP connections caused T to increase by 100 > seconds). > > This does seem to indicate that for core and edge routers that handle large > number of connections, > if the number of connections changes by large amounts , then the convergence > of the CoDel Interval > value will likely take a correspondingly large amount of time. > > This is probably true for other AQM schemes as well - is there any published > analysis on the subject? > > The example I have used may be a bit extreme, but I hope it illustrates the > issue Steve is pointing to. > > Regards, > Anil Agarwal > ViaSat Inc. > > -----Original Message----- > From: aqm [mailto:[email protected]] On Behalf Of Steven Blake > Sent: Tuesday, June 09, 2015 4:40 PM > To: Dave Taht > Cc: [email protected] > Subject: Re: [aqm] CoDel on high-speed links > > On Tue, 2015-06-09 at 12:44 -0700, Dave Taht wrote: > >> The below makes several mis-characterisations of codel in the first >> place, and then attempts to reason from there. > > Hmmm... > >> On Tue, Jun 9, 2015 at 9:11 AM, Steven Blake <[email protected]> wrote: >> > I have a question about how CoDel (as defined in >> > draft-ietf-aqm-codel-01) behaves on high-speed (e.g., >= 1 Gbps) links. >> > If this has been discussed before, please just point me in the right >> > direction. >> > >> > In the text below, I'm using "drop" to mean either packet >> > discard/ECN mark. I'm using (instantaneous) "drop frequency" to >> > mean the inverse of the interval between consecutive "drops" during >> > a congestion epoch, measured in drops/sec. >> > >> > The control law for CoDel computes the next time to drop a packet, >> > and is given as: >> > >> > t + interval/sqrt(count) >> > >> > where t is the current time, interval is a value roughly >> > proportional to maximum RTT (recommended 100 msec), and count is >> > cumulative number of drops during a congestion epoch. >> >> No. Count is just a variable to control the curve of the drop rate. It >> is not constantly incremented, either, it goes up and down based on >> how successful it is at controlling the flow(s), only incrementing >> while latency exceeds the target, decrementing slightly after it stays >> below the target. The time spent below the target is not accounted >> for, so you might have a high "bang-bang" drop rate retained, when >> something goes above from below. >> >> This subtlety is something people consistently miss and something I >> tried to elucidate in the first stanford talk. > > I specifically mentioned "during a congestion epoch", but let me be more > precise: count is continuously incremented during an extended period where > latency exceeds the target (perhaps because CoDel isn't yet dropping hard > enough). Correct? > > The fact that the drop frequency doesn't ramp down quickly when congestion is > momentarily relieved is good, but doesn't help if it takes forever for the > algorithm to ramp up to an effective drop frequency (i.e., something greater > than 1 drop/flow/minute). > >> >> >It is not hard to see that drop >> > frequency increases with sqrt(count). At the first drop, the >> >frequency is 10 drop/sec; after 100 drops it is 100 drops/sec; after >> >1000 drops it is 316 drops/sec. >> > >> > On a 4 Mbps link serving say 1000 packets/sec (on average), CoDel >> > immediately starts dropping 1% of packets and ramps up to ~10% after >> > 100 drops (1.86 secs). >> >> No it will wait 100ms after stuff first exceeds the target, then >> progressively shoot harder based on the progress of the >> interval/sqrt(count). > > Ok. At the first drop it is dropping at a rate of 1 packet/100 msec == > 10 drops/sec and ramps up from there. At the 100th drop it is dropping at a > rate of 100 msec/sqrt(100) == 1 packet/10 msec == 100 drops/sec. > This just so happens to occur after 1.8 secs. > > Aside: as described, CoDel's drop frequency during a congestion epoch > increases approximately linearly with time (at a rate of about 50 > drops/sec^2 when interval = 100 msec). > >> secondly people have this tendency to measure full size packets, or a >> 1k average packet. The reality is a dynamic range of 64 bytes to 64k >> (gso/tso/gro offloads). So bytes is a far better proxy than packets in >> order to think about this properly. >> >> offloads of various sorts bulking up packet sizes has been a headache. >> I favor reducing mss on highly congested underbuffered links (and bob >> favors sub-packet windows) to keep the "signal strength" up. >> >> The original definition of packet (circa 1962) was 1000 bits, with up >> to 8 fragments. I do wish the materials that were the foundation of >> packet behavior were online somewhere... > > I don't see how this has anything to do with the text of the draft or my > questions. > >> > This seems like a reasonable range. On a 10 GE link serving 2.5 >> > MPPs on average, CoDel would only drop 0.013% of packets after 1000 >> > drops (which would occur after 6.18 secs). >> >> I am allergic to averages as a statistic in the network measurement case. >> >> > This doesn't seem >> > to be very effective. It's possible to reduce interval to ramp up >> > drop frequency more quickly, but that is counter-intuitive because >> > interval should be roughly proportional to maximum RTT, which is >> > link-speed independent. >> >> Except that tcp's drop their rates by (typically) half on a drop, and >> a matter of debate as to when on CE. > > Ex/ 10 GE link, ~10K flows (average). During a congestion epoch, CoDel with > interval = 100 msec starts dropping 257 packets/sec after 5 secs. > How many flows is that effectively managing? > >> > Unless I am mistaken, it appears that the control law should be >> > normalized in some way to average packet rate. On a high-speed >> > link, it might be common to drop multiple packets per-msec, so it >> > also isn't clear to me whether the drop frequency needs to be >> > recalculated on every drop, or whether it could be recalculated over >> > a shorter interval (e.g., >> > 5 msec). >> >> Pie took the approach of sampling, setting a rate for shooting, over a >> 16ms interval. That's pretty huge, but also low cost in some hardware. >> >> Codel's timestamp per-packet control law is continuous (but you do >> need to have a cheap packet timestamping ability). >> >> Certainly in all cases more work is needed to address the problems >> 100gps rates have in general, and it is not just all queue theory! A >> small packet is .62 *ns* in that regime. A benefit of fq in this case >> is that you can parallelize fib table lookups across multiple >> processors/caches, and of fq_codel is that all codels operate >> independently. > > High-speed metro/core links need AQM, too. I believe that > draft-ietf-aqm-codel-01 doesn't work for these links in the general case > (e.g., large #flows, recommended value for interval, no FQ). IMHO the draft > should say something about this, or should adjust the algorithm accordingly. > > > Regards, > > // Steve > > _______________________________________________ > aqm mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/aqm -- Dave Täht What will it take to vastly improve wifi for everyone? https://plus.google.com/u/0/explore/makewififast _______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
