John, inline...
At 12:16 26/10/2013, John Leslie wrote:
Bob Briscoe <[email protected]> wrote:
>
> Exec summary
> * Early tests show promise that we may have found a way to make the
> ultra-low queuing delay of data centre TCP incrementally deployable
> on the public Internet
> * For rtcweb, we need to address
> a) cc for r-t media [rmcat w-g in progress]
> b) Making TCP nicer
> c) minimise ability of TCP to bloat queues [AQM w-g now in progress]
> This addresses b) & c)
>
> The problem
> * All AQMs delay dropping for about one (hard-coded) worst-case RTT,
> in case a burst dissipates (allegedly a 'good queue' according to Van
> Jacobson)
This assertion is going to need a lot of support.
Bob is a man after my own heart suggesting that an ECN notification
may be sent earlier than a packet drop would be indicated. I don't know
if we can get there; but IMHO that is essential to getting ECN deployed
and used.
I don't think I agree with Bob that what's hard-coded is necessarily
a "worst-case" RTT -- and I'm quite sure I'm not willing to make any
pronouncement about "all AQMs".
I suggest the talk might be more useful if Bob outlined the AQMs
currently in widespread use and detailed _how_ they delay dropping
for an estimated RTT.
You're right. The 'about' in my sentence was meant to indicate some
leeway. The specifics depend on each AQM...
The only AQM I know of that doesn't smooth over some nominal RTT is
DCTCP itself.
* CoDel was designed for 'interval' to be a worst-case (largest) RTT,
which it recommends to be set to 100ms. One the queue has exceeded
threshold, CoDel delays for time 'interval' before starting to signal
congestion.
- I already said how sluggish CoDel would be for a flow with a much
shorter RTT than CoDel (others have made this point, e.g. for data
centres :
https://lists.bufferbloat.net/pipermail/codel/2012-August/000448.html).
- And in the other direction, we already know that utilisation
suffers fairly badly for flows with RTT significantly larger than 100ms.
* PIE suppresses all drops for time max_burst (set to 100ms by
default) from when the drop probability it calculates (but doesn't
necessarily use) first rises above zero. This is very similar to
CoDel, and similar comments are applicable.
* RED requires the constant for its exponentially weighted moving
average (w_q) to be set taking into account how many packets are
likely to arrive at the link in a 'typical' RTT. Reverse engineering
the values recommended by Sally Floyd in the RED paper and in her
famous RED parameters Web page
<http://www.icir.org/floyd/REDparameters.txt>, she recommended a
'typical' RTT of about 130ms.
[BTW, I know of people who don't calculate w_q, but just use the
value of "0.002" that Sally recommended for her 45Mb/s link in the
original RED paper simulations (and repeated at
<http://www.icir.org/floyd/REDparameters.txt>). This was calculated
assuming about 500 packets arrive at a link (from all flows) in a
typical RTT. Links have got a lot faster since 1993. Nonetheless, she
was considering 45Mb/s for an aggregated link in those days, and it
happens to be about right for a single user today.]
> * For a flow with 1/10 or 1/100 of this RTT (e.g. from a CDN or your
> home media server), any congestion signal is delayed tens or hundreds
> of its own RTTs by these AQMs.
Clearly, RTTs differing by a factor of ten are quite common at most
nodes traversed in a typical path; and it seems _very_ suboptimal to
have the responsibility for guessing the RTT at the node which must
drop packets.
For packets that do not support ECN, the dropping node has to make a
guess at the RTT, so as not to drop packets unnecessarily, because
drop is an impairment as well as a congestion signal. So a transport
cannot 'undrop' packets.
Our point though is that a network node doesn't have to mimic this
behaviour for ECN packets, because ECN is not an impairment. So a
transport can un-ECN-mark packets (by smoothing out bursts itself).
> * A TCP flow in slow-start doesn't need the burst smoothed anyway
> - delaying the signal just makes slow-start overshoot more
> - a TCP in slow-start knows that it won't allow the burst to
> dissipate anyway
A critical point! (It seems obvious to me, but is it obvious to
everyone?)
> The solution: make ECN also mean "Immediate Congestion Notification"?
> * For ECN-capable packets, shift the job of hiding bursts from network to
> host:
> - the network signals ECN with no smoothing delay
> - then the transport can hide bursts of ECN signals from itself
But can we get there from here?
The node doing the ECN notification _can't_ know how the transport
will react; and the transport receiving and ECN notification can't know
whether the forwarding node has "smoothed" the signal. (It is truly a
shame we haven't left any bits for signals like this!)
Well, we do have ECT(1) still only assigned experimentally and never
used, which we could decide to use for this immediate ECN. However,
first I want to see whether people think it might be feasible to just
redefine the meaning of CE.
Rationale: So few buffers have ECN support turned on anyway that we
should be able to redefine ECN so that many more will want to turn it on.
For those AQMs that already support ECN, we believe this
retrospective change will make them only a little worse than they are
already (and the operator can update them by simple reconfiguration
anyway, and is more likely to do so, given these are clearly
early-adopter networks).
> - the transport knows
> o whether it's TCP or RTP etc,
> o whether its in congestion avoidance or slow-start,
> o and it knows its RTT,
> o so it can know whether to respond immediately or to smooth the
> signals,
> o and if so, over what time
Yes, but it can't know what smoothing may already have been applied.
Yes. If this is a problem, we will have to consider using ECT(1) not CE.
But it's pretty academic when so few buffers support ECN.
The tiny proportion that do support ECN will already smooth by a
'typical RTT' of about 100ms.
If a 20ms RTT flow adds smoothing over its own RTT to this, it will
be smooth over 120ms.
The main problem there is not the extra 20ms, it's the original
100ms, which we won't lose unless we make this change somehow.
> - then short RTT flows can smooth the signals with only the delay
> of their /own/ RTT
> o so they can fill troughs and absorb peaks that longer RTT
flows cannot
> - a TCP only needs to smooth the signals if in congestion avoidance
> o in slow start, it can respond immediately, thus reducing overshoot
This would, IMHO, improve "slow start".
> Incremental Deployment:
> * Immediate congestion notification doesn't need new AQM implementation
> - it can use the widely implemented WRED algorithm with an
> unexpected configuration
Bob is beginning to lose me here. Does he mean that a forwarding node
would apply WRED for both drop and ECN, but with different parameters?
> * The network classifies packets for this AQM treatment based on
> their ECN-capability
> - Without ECN, it smoothes the queue before signalling drops
Bob has lost me now -- apparently he doesn't mean different
parameters... and I don't recognize this "smoothing" step in WRED.
I do mean that a forwarding node would apply WRED for both drop and
ECN, but with different parameters.
Each WRED policy-map includes a setting for this smoothing parameter,
which Cisco calls the exponential-weighting-constant. Many people
don't notice it's there and they just leave it at the default. For
instance, Cisco set it to
2^(-9) ~ 0.002 by default for each of the WRED policy-maps (see
http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fswfq26.html#wp1039982).
> - With ECN, it signals immediately, without any smoothing delay
> - (as today, the operator can still use WRED with the Diffserv field too)
(Do we need to confuse this discussion by adding diffserv?)
A non-Diffserv network still doesn't need to worry about Diffserv.
I put this in parentheses because, if WRED is used today, it is
usually used with Diffserv, and I didn't want anyone to worry that
they wouldn't be able to continue to do this (e.g. BT use WRED with
Diffserv in enterprise networks, as do many other carriers).
> * For TCP apps, the stack will use 'DCTCP' (we've tweaked it), if the
> ends negotiate ECN with the accurate feedback capability.
Have we settled on "accurate feedback" already? I thought that was
still under discussion. (I don't follow exactly what it adds...)
See response from Richard Scheffenegger. Essentially the TCPM WG has
accepted the requirements doc, but not decided between the mechanisms on offer.
> * It should 'just work' if an RTP app or a Reno TCP uses ECN.
I don't see any way for a Reno transport using ECN to avoid being
starved if ECN arrives earlier (without notice).
We haven't tested legacy Reno with ECN yet (we figured legacy Reno
without ECN is a lot more prevalent, so focused on this first).
Nonetheless, Reno-ECN is unlikely to starve, because starvation is
about long-running behaviour, and once a flow has run for more than a
couple of 100ms RTTs, the immediate ECN signals should be no
different from a smoothed ECN. I suspect Reno-ECN might be worse in
its short-term dynamics. But remember Reno-ECN is likely to be a tiny
corner-case.
> The request:
> * Much more evaluation to do, but first we want to know:
> - if the idea works, would the IETF have an appetite for tweaking
> the definition of ECN so it is merely equivalent to drop in the long
> term, but the dynamics need not be equivalent.
There's a good question there; but I don't think we're ready for it.
At this stage, even we haven't got many answers. So I'm not asking
the IETF to answer the question right now. I'm merely saying, /if/
our idea works, is there at least an /appetite/ in the IETF for
reconsidering the definition of ECN?
We wanted to make the IETF aware of this research early, because it
might want to at least hold off on any actions that would otherwise
close off this option.
And if we find that any change is completely out of the question, we
have to try a different tack (e.g. ECT(1)).
I'd really like to discuss the dynamics of responding more quickly
but perhaps less drastically for almost any real-time flow.
But proving "equivalence in the long term" seems too hard.
This should be the easy part, because the longer that conditions are
stable, a smoothed signal should tend towards an unsmoothed signal,
all other factors being equal.
Equivalence during dynamics is the hard part, and I'm suggesting we
don't sweat too much about that, as long as the performance
evaluations are not too far apart.
> Much better than the ECN that didn't get deployed
> * This is Explicit and Immediate Congestion Notification (EICN?)
> - same wire protocol, much greater benefits
> * The advantage of the original ECN (avoiding congestive loss) was
> too small to be worth the deployment hassle
Actually, I don't agree that was the problem -- instead I believe
the code has been deployed but administratively suppressed because
the operators don't trust the transports. There _is_ a significant
improvement from one-RTT reaction instead of several (to detect a
drop), but the whole process is just too complicated, while the
opportunity for abuse remains obvious.
I agree. That's the 'deployment hassle' side of my sentence - the
extra trust-enhancing mechanisms that seemed necessary were too much
pain for the small gain.
> * Predictable ultra-low latency without loss too (similar to
> DCTCP-ECN) would be worth deploying
I'm optimistic that latency will become an easier argument.
> * But we all thought DCTCP could only be deployed in isolation (e.g.
> data centres)
> - we all thought DCTCP traffic would starve alongside today's TCP traffic
> - because in a DCTCP queue, the ECN threshold is lower than you
> would trigger drop
> - and we thought ECN & drop had to be equivalent.
(I'm not sure we'll succeed at breaking that "equivalence"...)
> * We believe we've found a way to ensure DCTCP-ECN traffic doesn't starve
> - we still make DCTCP-ECN equivalent to drop in the long-run, but
> not in its dynamics
(I'm still not sure it's worth arguing the "long-run".)
I mean competing long-running ECN & non-ECN flows stabilise at
predictable rates, rather than one ratchetting itself down to nothing
over time (starvation).
That's the primary concern of congestion control 'fairness', before
anyone starts worrying about what the relative rates are. Given apps
get different relative rates with different RTTs, with different size
objects or by opening multiple flows, we don't need to sweat so much
about precisely equal flow rates; but we must sweat about stable convergence.
Results so far show that the proposed idea is at least very robust
against starvation.
Bob
--
John Leslie <[email protected]>
________________________________________________________________
Bob Briscoe, BT
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm