Dave,
It says the point at which cake enters drop state depends on how quickly
the queue is growing.
But I don't see anything on the page you referenced about a better
curve. I looked under AQM and under other possibly relevant headings.
I meant to say in my mail to Andrew that the rate at which the control
law increases the dropping frequency ought to depend on how far the
queue is from target, or even how quickly it is diverging from target.
I've always said that this control law should not be just a hard-coded
thing that has no dependency on how the queue is moving.
Also below I've restated a couple of points I made back in 2013 when I
first posted this.
On 12/11/13 22:30, Bob Briscoe wrote:
5/ My analysis also shows that the rate of increase in drop probablity
is inversely proportional to the link rate. I don't believe this is
desirable, as it will require tuning for the expected number of flows
on the link. However it is probably OK for variable rate links.
6/ The rate of increase of drop probability is also inversely
proportional to the square of 'interval'. I suggest a new constant is
defined for the control law, rather than relating the control law to
interval in this way. Otherwise, when anyone tunes interval, they will
get undesirable changes to the control law (e.g. reducing interval 10x
[to 10ms] would make the control law 100x more sluggish).
This last point is particularly important. If the control law is still
to be hard-coded, it should relate to another variable RTT_ave, not
interval. TO decide when to enter dropping state, interval should be set
to the likely worst-case RTT (in the low hundreds of ms). But, If
hard-coded, the dynamics of the control law will be correct most often
if it depends on the /most likely/ RTT, which these days with CDNs is
perhaps 10-25ms (depending on population density in a country).
There's another reason for having two separate variables, not just
'interval' for both. I've said the following before somewhere as well:
For ECN marking, 'interval' can be set much lower (or even zero) because
being on a hair-trigger with ECN marks doesn't matter, whereas it does
for drops.{Note 1} Then CoDel will signal ECN without the sluggish delay
it has to have before it starts dropping. However, once it has entered
dropping state, the ECN control law still needs to be based on the same
RTT_ave as for drop, because the control law is trying to mimic the
dynamics of the congestion control, which RFC3168 says should be the
same for ECN as for drop.{Note 2}
Bob
{Note 1}: The faster dynamics won't affect the long-term stable rate
between an ECN and a non-ECN flow, because by definition long-term
stable means the dynamics have settled.
{Note 2}: As suggested in draft-briscoe-aqm-dualq-coupled-00, if the
congestion control were radically different, the packets would probably
need another identifier to distinguish them from 'classic' ECN.
On 30/09/15 10:00, Dave Taht wrote:
On Wed, Sep 30, 2015 at 1:50 AM, Bob Briscoe <[email protected]> wrote:
Andrew,
I am also not so interested in an AQM dealing directly with unresponsive
traffic - I prefer to keep policing and AQM as separately deployable
functions, because AQM should be policy-neutral, whereas policing inherently
involves policy.
My concern was merely that CoDel's linear increase in drop probability can
take a long time to reach where it intends to get to. I would have thought
some form of exponential increase, or at least super-linear, would have been
more responsive to changing traffic conditions. I.e., rather than have to
answer the question "how quickly should drop probability increase?", make it
increase increasingly quickly.
Early on, Rong Pan showed that it takes CoDel ages to bring high load under
control. I think this linear increase is the reason.
cake uses a better curve for codel, but we still need to do more
testing in the lab.
http://www.bufferbloat.net/projects/codel/wiki/CakeTechnical
Bob
On 30/09/15 01:42, Andrew Mcgregor wrote:
Hmm, that's really interesting.
Most interesting is that my understanding is that the control law was
intended to deal with aggregates of mostly TCP-like traffic, and that an
overload of unresponsive traffic wasn't much of a goal; this seems like
vaguely reasonable behaviour, I suppose, given that pathological situation.
But I don't have a way to derive the control law from first principles at
this time (I haven't been working on that for a long time now).
On 25 September 2015 at 06:27, Bob Briscoe <[email protected]> wrote:
Toke,
Having originally whinged that no-one ever responded to my original 2013
posting, now it's my turn to be embarrassed for having missed your
interesting response for over 3 months.
Cool that the analysis proves correct in practice - always nice.
The question is still open whether this was the intention, and if so why
this particular control law was intended.
I would rather we started from a statement of what the control law ought
to do, then derive it.
Andrew McGregor said he would have a go at this question some time ago...
Andrew?
Bob
On 07/06/15 20:27, Toke Høiland-Jørgensen wrote:
Hi Bob
Apologies for reviving this ancient thread; been meaning to get around
to it sooner, but well... better late than never I suppose.
(Web link to your original mail, in case Message-ID referencing breaks:
https://www.ietf.org/mail-archive/web/aqm/current/msg00376.html ).
Having recently had a need to understand CoDel's behaviour in more
detail, your analysis popped out of wherever it's been hiding in the
back of my mind and presented itself as maybe a good place to start. :)
So anyhow, I'm going to skip the initial assertions in your email and
focus on the analysis:
Here's my working (pls check it - I may have made mistakes)
_________________
For brevity, I'll define some briefer variable names:
interval = I [s]
next_drop = D [s]
packet-rate = R [pkt/s]
count = n [pkt]
>From the CoDel control law code:
D(n) = I / sqrt(n)
And the instantaneous drop probability is:
p(n) = 1/( R * D(n) )
Then the slope of the rise in drop probability with time is:
Delta p / Delta t = [p(n+1) - p(n)] / D(n)
= [1/D(n+1) - 1/D(n)] / [ R * D(n) ]
= sqrt(n) * [sqrt(n+1) - sqrt(n)] /
[R*I*I]
= [ sqrt(n(n+1)) - n ] / R*I^2
I couldn't find anything wrong with the derivation. I'm not entirely
sure that I think it makes sense to speak about an "instantaneous drop
probability" for an algorithm that is not probabilistic in nature.
However, interpreting p(n) as "the fraction of packets dropped over the
interval from D(n) to D(n+1)" makes sense, I guess, and for this
analysis that works.
At count = 1, the numerator starts at sqrt(2)-1 = 0.414.
Amd as n increases, it rapidly tends to 1/2.
So CoDel's rate of increase of drop probability with time is nearly
constant (it
is always between 0.414 and 0.5) and it rapidly approaches 0.5 after a few
drops, tending towards:
dp/dt = 1/(2*R*I^2)
This constant increase clearly has very little to do with the square-root
law of
TCP Reno.
In the above formula, drop probability increases inversely proportional to
the
packet rate. For instance, with I = 100ms and 1500B packets
at 10Mb/s => R = 833 pkt/s => dp/dt = 6.0% /s
at 100Mb/s => R = 8333 pkt/s => dp/dt = 0.6% /s
I also tried to test this. I configured CoDel (on a Linux 4.0 box) on
1Mbps, 2Mbps and 10Mbps links with interval settings of 1 second and
500ms, and a total packet limit of 100k packets. This was to make it
deliberately slower to react (so the change in drop probability is more
visible), and to make sure no packets are dropped from queue overflow.
I then sent an unresponsive UDP stream over the link at 110% of the link
capacity (as passed to Iperf, so approximately), and collected the
output of `tc -s qdisc` every 0.2 seconds.
The attached plot is of 'pkts dropped / (pkts sent + pkts dropped)' in a
2-second sliding window over the duration of the test (the plot is also
available here:
https://kau.toke.dk/ietf/codel-drop-rate/codel-drop-rate.svg ).
I've included linear trend lines from the initial time to the point of
maximum drop probability, and as is apparent from the plot, got quite a
good fit (r>0.99 for all six data sets). The legend includes the slopes
of the linear fits for each of the data sets, which are not too far from
what your analysis predicts (and I'm guessing the difference can be
attributed to the difference in exact packet rates, but I haven't
checked).
The Flent data files with the qdisc stats over time (readable by the
newest git version of Flent), as well as the Python script I used to
create the graph are available here:
https://kau.toke.dk/ietf/codel-drop-rate/
So, in short: It seems that CoDel's "drop rate" does increase linearly
in the presence of a persistent queue, and that the rate of increase
depends on both the interval and the link rate.
Now, I'll refrain from commenting on whether or not this is "bad", or
indeed if it is contrary to design. It was surprising to me at least, so
I thought I'd share my findings, in the hope that someone would either
find them useful or tell me how they're wrong (or both!). :)
-Toke
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm
--
________________________________________________________________
Bob Briscoe http://bobbriscoe.net/
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm
--
Andrew McGregor | SRE | [email protected] | +61 4 1071 2221
--
________________________________________________________________
Bob Briscoe http://bobbriscoe.net/
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm
--
________________________________________________________________
Bob Briscoe http://bobbriscoe.net/
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm