I can’t speak for Kathy, but as Cake’s author I have some insight into the
changes I’ve made to the associated version of Codel.
Ronald Bless:
1. after exiting dropping state, count is usually reset, unless "the
next drop state is entered too close to the previous one”.
In fact, in most cases (ie. where the true RTT is near to or less than the one
assumed by the interval parameter, and the flow is persistent) the drop state
will be re-entered sufficiently soon to prevent the reset. The backoff
behaviour for that case is thus more significant in practice.
The standard backoff behaviour is to reduce count by just 2. This is obviously
wrong in the case where count has already reached a significant value; with a
high drop frequency, count will climb a lot during a single RTT. But if the
elevated drop frequency is necessary and sufficient to control the flow, we
don’t want it to grow indefinitely.
Cake’s version halves the count, reducing the drop frequency by a factor of
sqrt(2) after a pause. This multiplicative backoff prevents count from growing
indefinitely, and allows it to reduce gradually to a lower value if that
becomes appropriate due to reduced network load.
Anil's suggestion of continuously reducing count during the non-dropping state
is an interesting alternative which might also make sense. It has the pleasing
property of relating the backoff rate to the length of time spent outside the
dropping state.
2. is 'count' supposed to be reset or saturated on overflow and what
should be its maximum value (it makes a difference whether you are using
16-, 32-, or 64-bit counter)?
It’s clear to me that a saturating increment is necessary to deal with
unresponsive flows adequately. Otherwise, the drop frequency will wrap around
to a low value periodically when it needs to be consistently high.
My test case for this involves 50 upstream TCP flows on a narrow link (say
10Mbit) and a single downstream TCP flow on a wide link (say 100Mbit). In this
configuration, the acks for the downstream flow constitute an unresponsive (in
fact, anti-responsive) upstream flow of greater throughput demand than the
individual upstream TCP flows. Anti-responsive means that dropping packets
actually increases the network load, because reducing the ack density permits
higher throughput on the downlink, which in turn generates additional acks on
the uplink.
Since Cake uses very robust flow isolation, simply delivering all the acks in
order, using a fair-share of the bandwidth, would severely limit the throughput
of the downstream flow. However, the standard Codel behaviour resulted in
wildly oscillating throughput on the downstream flow. Straightforward
calculations suggested that the count variable was probably wrapping.
At first, I only put in the modified backoff behaviour mentioned above. Since
the downstream flow was often reaching full throughput, I thought that
occasionally the drop rate was managing to empty the ack queue, making the
backoff behaviour significant. This assumption turned out to be correct, but
the backoff modification was insufficient to completely correct the behaviour.
Only when I changed count to use a saturating increment instead of a wrapping
one did the downstream flow achieve full throughput consistently.
It’s possible that if count was a wider variable (32 instead of 16 bits) that
the backoff modification would have been sufficient to prevent wrapping in this
case. However, using saturating arithmetic is inexpensive on modern CPUs (many
have dedicated instructions which could be used by highly optimised
implementations) and is the most robust solution.
Note that in the absence of robust flow isolation, eg. with plain Codel or PIE,
the upstream TCP flows will back off to make room for all the acks. This does
not require a particularly high drop rate, but it reduces the upstream goodput
significantly. Cake instead achieves high goodput in both directions in this
especially challenging case.
Anil Agarwal:
I have attached the updated document.
There are indeed a lot of suggestions here, which I haven’t yet fully
assimilated. I’ll cherry-pick a few that stood out:
When RTT is small compared to the default Interval, CoDel is slow in reacting
and controlling queue size/delay during TCP slow start.
For example, if Interval = 100 ms and RTT = 20 ms, CoDel will not drop or mark
a packet for the first 100 ms of a new TCP connection; if link rate = 10 Mbps,
initial window size = 14600 bytes, then during this period, the TCP cwnd will
grow to 104 kbytes; queuing delay will be 63 ms.
Cake’s modified Codel does vary its sensitivity according to how quickly the
queue is filling. With the default parameters, it may trigger as quickly as
35ms after a large burst arrives, or as slowly as 200ms if the queue is growing
very slowly.
It could be improved further in theory, by giving Codel visibility of the
length and drain rate of the queue (and thus the ability to predict future
delay) rather than only observing the exit delay. However, in the general case
the drain rate is variable, so this may be difficult to make robust.
Ultimately, triggering instantly when the queue begins to fill is optimal to
deal with slow-start, which will double its cwnd further while the congestion
signal is in transit, before responding with a halving. However, an instant
trigger is suboptimal for the congestion-avoidance phase of TCP, which is why
Codel always delays its response. Flow isolation is a good (if imperfect)
countermeasure for the suboptimal slow-start behaviour this causes.
My “Explicit Load Regulation” proposal, which hasn’t yet been widely
circulated, aims to get closer to optimal behaviour in both situations. It is
an extension to ECN behaviour, giving the option of more moderate signals than
ECN permits, while being easier to deploy incrementally than DCTCP is.
Hopefully I’ll be able to write it up properly later.
The draft rfc states - “For example, there is about an 86% probability that no
more than two of the 100 VoIP sessions will be involved in any given collision”
Based on analytical equations for hash collision probabilities (I derived them
independently some time ago), the probability of no collision = 90.78%…
Cake’s set-associative hash function reduces the probability of flow collisions
further, to negligible values for the above number of flows.
Alternatively, codel_dequeue() can be re-structured as two routines…
Certainly a refactoring would be beneficial for readability. I haven’t got
around to doing it yet.
- Jonathan Morton
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm