John, inline...

At 12:16 26/10/2013, John Leslie wrote:
Bob Briscoe <[email protected]> wrote:
>
> Exec summary
> * Early tests show promise that we may have found a way to make the
> ultra-low queuing delay of data centre TCP incrementally deployable
> on the public Internet
> * For rtcweb, we need to address
>   a) cc for r-t media [rmcat w-g in progress]
>   b) Making TCP nicer
>   c) minimise ability of TCP to bloat queues [AQM w-g now in progress]
>   This addresses b) & c)
>
> The problem
> * All AQMs delay dropping for about one (hard-coded) worst-case RTT,
> in case a burst dissipates (allegedly a 'good queue' according to Van
> Jacobson)

   This assertion is going to need a lot of support.

   Bob is a man after my own heart suggesting that an ECN notification
may be sent earlier than a packet drop would be indicated. I don't know
if we can get there; but IMHO that is essential to getting ECN deployed
and used.

   I don't think I agree with Bob that what's hard-coded is necessarily
a "worst-case" RTT -- and I'm quite sure I'm not willing to make any
pronouncement about "all AQMs".

   I suggest the talk might be more useful if Bob outlined the AQMs
currently in widespread use and detailed _how_ they delay dropping
for an estimated RTT.

You're right. The 'about' in my sentence was meant to indicate some leeway. The specifics depend on each AQM...

The only AQM I know of that doesn't smooth over some nominal RTT is DCTCP itself.

* CoDel was designed for 'interval' to be a worst-case (largest) RTT, which it recommends to be set to 100ms. One the queue has exceeded threshold, CoDel delays for time 'interval' before starting to signal congestion.

- I already said how sluggish CoDel would be for a flow with a much shorter RTT than CoDel (others have made this point, e.g. for data centres : https://lists.bufferbloat.net/pipermail/codel/2012-August/000448.html). - And in the other direction, we already know that utilisation suffers fairly badly for flows with RTT significantly larger than 100ms.

* PIE suppresses all drops for time max_burst (set to 100ms by default) from when the drop probability it calculates (but doesn't necessarily use) first rises above zero. This is very similar to CoDel, and similar comments are applicable.

* RED requires the constant for its exponentially weighted moving average (w_q) to be set taking into account how many packets are likely to arrive at the link in a 'typical' RTT. Reverse engineering the values recommended by Sally Floyd in the RED paper and in her famous RED parameters Web page <http://www.icir.org/floyd/REDparameters.txt>, she recommended a 'typical' RTT of about 130ms.

[BTW, I know of people who don't calculate w_q, but just use the value of "0.002" that Sally recommended for her 45Mb/s link in the original RED paper simulations (and repeated at <http://www.icir.org/floyd/REDparameters.txt>). This was calculated assuming about 500 packets arrive at a link (from all flows) in a typical RTT. Links have got a lot faster since 1993. Nonetheless, she was considering 45Mb/s for an aggregated link in those days, and it happens to be about right for a single user today.]


> * For a flow with 1/10 or 1/100 of this RTT (e.g. from a CDN or your
> home media server), any congestion signal is delayed tens or hundreds
> of its own RTTs by these AQMs.

   Clearly, RTTs differing by a factor of ten are quite common at most
nodes traversed in a typical path; and it seems _very_ suboptimal to
have the responsibility for guessing the RTT at the node which must
drop packets.

For packets that do not support ECN, the dropping node has to make a guess at the RTT, so as not to drop packets unnecessarily, because drop is an impairment as well as a congestion signal. So a transport cannot 'undrop' packets.

Our point though is that a network node doesn't have to mimic this behaviour for ECN packets, because ECN is not an impairment. So a transport can un-ECN-mark packets (by smoothing out bursts itself).


> * A TCP flow in slow-start doesn't need the burst smoothed anyway
>   - delaying the signal just makes slow-start overshoot more
>   - a TCP in slow-start knows that it won't allow the burst to
> dissipate anyway

   A critical point! (It seems obvious to me, but is it obvious to
everyone?)

> The solution: make ECN also mean "Immediate Congestion Notification"?
> * For ECN-capable packets, shift the job of hiding bursts from network to
> host:
>   - the network signals ECN with no smoothing delay
>   - then the transport can hide bursts of ECN signals from itself

   But can we get there from here?

   The node doing the ECN notification _can't_ know how the transport
will react; and the transport receiving and ECN notification can't know
whether the forwarding node has "smoothed" the signal. (It is truly a
shame we haven't left any bits for signals like this!)

Well, we do have ECT(1) still only assigned experimentally and never used, which we could decide to use for this immediate ECN. However, first I want to see whether people think it might be feasible to just redefine the meaning of CE.

Rationale: So few buffers have ECN support turned on anyway that we should be able to redefine ECN so that many more will want to turn it on.

For those AQMs that already support ECN, we believe this retrospective change will make them only a little worse than they are already (and the operator can update them by simple reconfiguration anyway, and is more likely to do so, given these are clearly early-adopter networks).


>   - the transport knows
>     o whether it's TCP or RTP etc,
>     o whether its in congestion avoidance or slow-start,
>     o and it knows its RTT,
>     o so it can know whether to respond immediately or to smooth the
>     signals,
>     o and if so, over what time

   Yes, but it can't know what smoothing may already have been applied.

Yes. If this is a problem, we will have to consider using ECT(1) not CE.
But it's pretty academic when so few buffers support ECN.

The tiny proportion that do support ECN will already smooth by a 'typical RTT' of about 100ms.

If a 20ms RTT flow adds smoothing over its own RTT to this, it will be smooth over 120ms. The main problem there is not the extra 20ms, it's the original 100ms, which we won't lose unless we make this change somehow.


>   - then short RTT flows can smooth the signals with only the delay
> of their /own/ RTT
> o so they can fill troughs and absorb peaks that longer RTT flows cannot
>   - a TCP only needs to smooth the signals if in congestion avoidance
>     o in slow start, it can respond immediately, thus reducing overshoot

   This would, IMHO, improve "slow start".

> Incremental Deployment:
> * Immediate congestion notification doesn't need new AQM implementation
>   - it can use the widely implemented WRED algorithm with an
> unexpected configuration

   Bob is beginning to lose me here. Does he mean that a forwarding node
would apply WRED for both drop and ECN, but with different parameters?

> * The network classifies packets for this AQM treatment based on
> their ECN-capability
>   - Without ECN, it smoothes the queue before signalling drops

   Bob has lost me now -- apparently he doesn't mean different
parameters... and I don't recognize this "smoothing" step in WRED.

I do mean that a forwarding node would apply WRED for both drop and ECN, but with different parameters.

Each WRED policy-map includes a setting for this smoothing parameter, which Cisco calls the exponential-weighting-constant. Many people don't notice it's there and they just leave it at the default. For instance, Cisco set it to 2^(-9) ~ 0.002 by default for each of the WRED policy-maps (see http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fswfq26.html#wp1039982).


>   - With ECN, it signals immediately, without any smoothing delay
>   - (as today, the operator can still use WRED with the Diffserv field too)

   (Do we need to confuse this discussion by adding diffserv?)

A non-Diffserv network still doesn't need to worry about Diffserv.

I put this in parentheses because, if WRED is used today, it is usually used with Diffserv, and I didn't want anyone to worry that they wouldn't be able to continue to do this (e.g. BT use WRED with Diffserv in enterprise networks, as do many other carriers).


> * For TCP apps, the stack will use 'DCTCP' (we've tweaked it), if the
> ends negotiate ECN with the accurate feedback capability.

   Have we settled on "accurate feedback" already? I thought that was
still under discussion. (I don't follow exactly what it adds...)

See response from Richard Scheffenegger. Essentially the TCPM WG has accepted the requirements doc, but not decided between the mechanisms on offer.


> * It should 'just work' if an RTP app or a Reno TCP uses ECN.

   I don't see any way for a Reno transport using ECN to avoid being
starved if ECN arrives earlier (without notice).

We haven't tested legacy Reno with ECN yet (we figured legacy Reno without ECN is a lot more prevalent, so focused on this first). Nonetheless, Reno-ECN is unlikely to starve, because starvation is about long-running behaviour, and once a flow has run for more than a couple of 100ms RTTs, the immediate ECN signals should be no different from a smoothed ECN. I suspect Reno-ECN might be worse in its short-term dynamics. But remember Reno-ECN is likely to be a tiny corner-case.


> The request:
> * Much more evaluation to do, but first we want to know:
>   - if the idea works, would the IETF have an appetite for tweaking
> the definition of ECN so it is merely equivalent to drop in the long
> term, but the dynamics need not be equivalent.

   There's a good question there; but I don't think we're ready for it.

At this stage, even we haven't got many answers. So I'm not asking the IETF to answer the question right now. I'm merely saying, /if/ our idea works, is there at least an /appetite/ in the IETF for reconsidering the definition of ECN?

We wanted to make the IETF aware of this research early, because it might want to at least hold off on any actions that would otherwise close off this option.

And if we find that any change is completely out of the question, we have to try a different tack (e.g. ECT(1)).


   I'd really like to discuss the dynamics of responding more quickly
but perhaps less drastically for almost any real-time flow.

   But proving "equivalence in the long term" seems too hard.

This should be the easy part, because the longer that conditions are stable, a smoothed signal should tend towards an unsmoothed signal, all other factors being equal.

Equivalence during dynamics is the hard part, and I'm suggesting we don't sweat too much about that, as long as the performance evaluations are not too far apart.


> Much better than the ECN that didn't get deployed
> * This is Explicit and Immediate Congestion Notification (EICN?)
>   - same wire protocol, much greater benefits
> * The advantage of the original ECN (avoiding congestive loss) was
> too small to be worth the deployment hassle

   Actually, I don't agree that was the problem -- instead I believe
the code has been deployed but administratively suppressed because
the operators don't trust the transports. There _is_ a significant
improvement from one-RTT reaction instead of several (to detect a
drop), but the whole process is just too complicated, while the
opportunity for abuse remains obvious.

I agree. That's the 'deployment hassle' side of my sentence - the extra trust-enhancing mechanisms that seemed necessary were too much pain for the small gain.


> * Predictable ultra-low latency without loss too (similar to
> DCTCP-ECN) would be worth deploying

   I'm optimistic that latency will become an easier argument.

> * But we all thought DCTCP could only be deployed in isolation (e.g.
> data centres)
>   - we all thought DCTCP traffic would starve alongside today's TCP traffic
>   - because in a DCTCP queue, the ECN threshold is lower than you
> would trigger drop
>   - and we thought ECN & drop had to be equivalent.

   (I'm not sure we'll succeed at breaking that "equivalence"...)

> * We believe we've found a way to ensure DCTCP-ECN traffic doesn't starve
>   - we still make DCTCP-ECN equivalent to drop in the long-run, but
> not in its dynamics

   (I'm still not sure it's worth arguing the "long-run".)

I mean competing long-running ECN & non-ECN flows stabilise at predictable rates, rather than one ratchetting itself down to nothing over time (starvation).

That's the primary concern of congestion control 'fairness', before anyone starts worrying about what the relative rates are. Given apps get different relative rates with different RTTs, with different size objects or by opening multiple flows, we don't need to sweat so much about precisely equal flow rates; but we must sweat about stable convergence.

Results so far show that the proposed idea is at least very robust against starvation.


Bob


--
John Leslie <[email protected]>

________________________________________________________________
Bob Briscoe, BT
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to