As is probably well known at this point, I make a clear distinction
between networking problems in the data center and ones in the wild
and wooly world.

The data center portion of the universe is a couple hundred meters in
diameter, the other, I dunno, let's say 8x from here to the moon
(3,075,200,000m).

There are all sorts of things that work and are needed in the data
center that probably won't work outside it. People run straighter
cables, do microwave, use pause frames at layer 2, etc, etc, in order
to wring out the last nanosecond of performance, and certainly running
without loss is important in that world.

On Sun, Oct 27, 2013 at 6:36 PM, Bob Briscoe <[email protected]> wrote:
> John, inline...
>
> At 12:16 26/10/2013, John Leslie wrote:
>>
>> Bob Briscoe <[email protected]> wrote:
>> >
>> > Exec summary
>> > * Early tests show promise that we may have found a way to make the

I'm failing to get excited lacking paper, testable code, and other proofs.

>> > ultra-low queuing delay of data centre TCP incrementally deployable
>> > on the public Internet

At the moment the edge of the internet is using things like cable and
gpon which have 2ms or more of inherent latency built into their
grant/response structures.

>> > * For rtcweb, we need to address
>> >   a) cc for r-t media [rmcat w-g in progress]
>> >   b) Making TCP nicer
>> >   c) minimise ability of TCP to bloat queues [AQM w-g now in progress]
>> >   This addresses b) & c)

I do strongly feel that webrtc needs good aqm and packet scheduling in
order to succeed, and that the webrtc folk should be testing out what
has already been developed (red, sfqred, codel, fq_codel, pie) and the
aqm folk testing the webrtc code.

I look forward to some interesting conference calls.

I gave the webrtc codebases a whirl against what aqm and packet
scheduling techniques we have already this past summer, the initial
results were encouraging, but I was able to easily crash most of the
browsers sooner than I could take a good, repeatable set of
measurements. That said, things over there are moving along smartly.

>> >
>> > The problem
>> > * All AQMs delay dropping for about one (hard-coded) worst-case RTT,
>> > in case a burst dissipates (allegedly a 'good queue' according to Van
>> > Jacobson)
>>
>>    This assertion is going to need a lot of support.
>>
>>    Bob is a man after my own heart suggesting that an ECN notification
>> may be sent earlier than a packet drop would be indicated. I don't know
>> if we can get there; but IMHO that is essential to getting ECN deployed
>> and used.
>>
>>    I don't think I agree with Bob that what's hard-coded is necessarily
>> a "worst-case" RTT -- and I'm quite sure I'm not willing to make any
>> pronouncement about "all AQMs".
>>
>>    I suggest the talk might be more useful if Bob outlined the AQMs
>> currently in widespread use and detailed _how_ they delay dropping
>> for an estimated RTT.
>
>
> You're right. The 'about' in my sentence was meant to indicate some leeway.
> The specifics depend on each AQM...
>
> The only AQM I know of that doesn't smooth over some nominal RTT is DCTCP
> itself.

DCTCP fits uncomfortably into the aqm catagory.

>
> * CoDel was designed for 'interval' to be a worst-case (largest) RTT, which
> it recommends to be set to 100ms. One the queue has exceeded threshold,

Which is 5ms of delay.

> CoDel delays for time 'interval' before starting to signal congestion.

Which it then decreases until it finds something approximating the
RTT. For a system in a reasonably steady state, it will find a decent
value and stay there.

For bursty traffic it is not necessarily helpful, and slow start is
interesting...

>  - I already said how sluggish CoDel would be for a flow with a much shorter

But you've never published a measurement, unless there is one in your new paper?

> RTT than CoDel (others have made this point, e.g. for data centres :
> https://lists.bufferbloat.net/pipermail/codel/2012-August/000448.html).

You might want to read the full thread... a stumbling block on fixing
the srtt was the stochastic hashing... so now there is a full blown
non-approximated fq scheduler that implements pacing.

http://www.ietf.org/mail-archive/web/aqm/current/msg00259.html

And I look forward to a certain upcoming presentation in ICCRG with
great anticipation as to additional fallout from this....

>  - And in the other direction, we already know that utilisation suffers
> fairly badly for flows with RTT significantly larger than 100ms.

"target" and "interval" have always been variables in the fq_codel and
codel codebase and ecn is supported.

tc qdisc add dev your_device root fq_codel target 500us interval 10ms ecn

I don't think the pain of adding that one line of configuration would
affect a data center deployment any. I would certainly like to see
benchmarks in a data center environment. I personally lack 10Gig hw.

I note that once you get below 1ms on a typical box today, you start
hitting other bottlenecks in (for example) the cpu scheduler.

> * PIE suppresses all drops for time max_burst (set to 100ms by default) from
> when the drop probability it calculates (but doesn't necessarily use) first
> rises above zero. This is very similar to CoDel, and similar comments are
> applicable.

This too has a variable target value, although there are several other
magic constants that I don't fully understand. I would certainly like
to see some benchmarks in a data center environment. The pie code I
have for linux shoots for a target of 20ms by default, for some
reason.

> * RED requires the constant for its exponentially weighted moving average
> (w_q) to be set taking into account how many packets are likely to arrive at
> the link in a 'typical' RTT. Reverse engineering the values recommended by
> Sally Floyd in the RED paper and in her famous RED parameters Web page
> <http://www.icir.org/floyd/REDparameters.txt>, she recommended a 'typical'
> RTT of about 130ms.
>
> [BTW, I know of people who don't calculate w_q, but just use the value of
> "0.002" that Sally recommended for her 45Mb/s link in the original RED paper
> simulations (and repeated at <http://www.icir.org/floyd/REDparameters.txt>).
> This was calculated assuming about 500 packets arrive at a link (from all
> flows) in a typical RTT. Links have got a lot faster since 1993.
> Nonetheless, she was considering 45Mb/s for an aggregated link in those
> days, and it happens to be about right for a single user today.]
>
>
>> > * For a flow with 1/10 or 1/100 of this RTT (e.g. from a CDN or your
>> > home media server), any congestion signal is delayed tens or hundreds
>> > of its own RTTs by these AQMs.

Congestion takes time to occur. Consider tcp dynamics as flows also
only increase relative to RTT, as well.

A core question is what do you want response time to congestion to be under?

100ms? 5ms? 1ms? 500us?

>>    Clearly, RTTs differing by a factor of ten are quite common at most
>> nodes traversed in a typical path; and it seems _very_ suboptimal to

2.4 billion folk connect to the internet generally at RTTs between 20 and 80ms.

In a data center environment I have no data.

>> have the responsibility for guessing the RTT at the node which must
>> drop packets.

I certainly favor the ongoing development of end to end cc capable of
monitoring the RTT and doing the right thing.

> For packets that do not support ECN, the dropping node has to make a guess
> at the RTT, so as not to drop packets unnecessarily, because drop is an
> impairment as well as a congestion signal. So a transport cannot 'undrop'
> packets.
>
> Our point though is that a network node doesn't have to mimic this behaviour
> for ECN packets, because ECN is not an impairment. So a transport can
> un-ECN-mark packets (by smoothing out bursts itself).
>
>
>> > * A TCP flow in slow-start doesn't need the burst smoothed anyway
>> >   - delaying the signal just makes slow-start overshoot more
>> >   - a TCP in slow-start knows that it won't allow the burst to
>> > dissipate anyway

The end of slow start is an ECN notification or a packet drop. What am
I missing here?

>>    A critical point! (It seems obvious to me, but is it obvious to
>> everyone?)

>>
>> > The solution: make ECN also mean "Immediate Congestion Notification"?
>> > * For ECN-capable packets, shift the job of hiding bursts from network
>> > to
>> > host:
>> >   - the network signals ECN with no smoothing delay
>> >   - then the transport can hide bursts of ECN signals from itself
>>
>>    But can we get there from here?
>>
>>    The node doing the ECN notification _can't_ know how the transport
>> will react; and the transport receiving and ECN notification can't know
>> whether the forwarding node has "smoothed" the signal. (It is truly a
>> shame we haven't left any bits for signals like this!)
>
>
> Well, we do have ECT(1) still only assigned experimentally and never used,
> which we could decide to use for this immediate ECN. However, first I want
> to see whether people think it might be feasible to just redefine the
> meaning of CE.

A lot of the NONCE logic has been discussed in other rfcs.

>
> Rationale: So few buffers have ECN support turned on anyway that we should
> be able to redefine ECN so that many more will want to turn it on.
>
> For those AQMs that already support ECN, we believe this retrospective
> change will make them only a little worse than they are already (and the
> operator can update them by simple reconfiguration anyway, and is more
> likely to do so, given these are clearly early-adopter networks).
>
>
>> >   - the transport knows
>> >     o whether it's TCP or RTP etc,
>> >     o whether its in congestion avoidance or slow-start,
>> >     o and it knows its RTT,
>> >     o so it can know whether to respond immediately or to smooth the
>> >     signals,
>> >     o and if so, over what time
>>
>>    Yes, but it can't know what smoothing may already have been applied.
>
>
> Yes. If this is a problem, we will have to consider using ECT(1) not CE.
> But it's pretty academic when so few buffers support ECN.
>
> The tiny proportion that do support ECN will already smooth by a 'typical
> RTT' of about 100ms.
>
> If a 20ms RTT flow adds smoothing over its own RTT to this, it will be
> smooth over 120ms.
> The main problem there is not the extra 20ms, it's the original 100ms, which
> we won't lose unless we make this change somehow.
>
>
>> >   - then short RTT flows can smooth the signals with only the delay
>> > of their /own/ RTT
>> >     o so they can fill troughs and absorb peaks that longer RTT flows
>> > cannot
>> >   - a TCP only needs to smooth the signals if in congestion avoidance
>> >     o in slow start, it can respond immediately, thus reducing overshoot
>>
>>    This would, IMHO, improve "slow start".
>>
>> > Incremental Deployment:
>> > * Immediate congestion notification doesn't need new AQM implementation
>> >   - it can use the widely implemented WRED algorithm with an
>> > unexpected configuration
>>
>>    Bob is beginning to lose me here. Does he mean that a forwarding node
>> would apply WRED for both drop and ECN, but with different parameters?
>>
>> > * The network classifies packets for this AQM treatment based on
>> > their ECN-capability
>> >   - Without ECN, it smoothes the queue before signalling drops
>>
>>    Bob has lost me now -- apparently he doesn't mean different
>> parameters... and I don't recognize this "smoothing" step in WRED.
>
>
> I do mean that a forwarding node would apply WRED for both drop and ECN, but
> with different parameters.
>
> Each WRED policy-map includes a setting for this smoothing parameter, which
> Cisco calls the exponential-weighting-constant. Many people don't notice
> it's there and they just leave it at the default. For instance, Cisco set it
> to
> 2^(-9) ~ 0.002 by default for each of the WRED policy-maps (see
> http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fswfq26.html#wp1039982).
>
>
>> >   - With ECN, it signals immediately, without any smoothing delay
>> >   - (as today, the operator can still use WRED with the Diffserv field
>> > too)
>>
>>    (Do we need to confuse this discussion by adding diffserv?)
>
>
> A non-Diffserv network still doesn't need to worry about Diffserv.
>
> I put this in parentheses because, if WRED is used today, it is usually used
> with Diffserv, and I didn't want anyone to worry that they wouldn't be able
> to continue to do this (e.g. BT use WRED with Diffserv in enterprise
> networks, as do many other carriers).

Perhaps we need a taxonomy so we can all talk to areas of the network
we care about? RED as currently defined will not work correctly on
variable rate networks, like wireless...

I have never thought we'd end up with one aqm or tcp to rule them all..

DC: Data Center
WI: Wireless
FL: Fixed Line

>
>> > * For TCP apps, the stack will use 'DCTCP' (we've tweaked it), if the
>> > ends negotiate ECN with the accurate feedback capability.
>>
>>    Have we settled on "accurate feedback" already? I thought that was
>> still under discussion. (I don't follow exactly what it adds...)
>
>
> See response from Richard Scheffenegger. Essentially the TCPM WG has
> accepted the requirements doc, but not decided between the mechanisms on
> offer.
>
>
>> > * It should 'just work' if an RTP app or a Reno TCP uses ECN.
>>
>>    I don't see any way for a Reno transport using ECN to avoid being
>> starved if ECN arrives earlier (without notice).
>
>
> We haven't tested legacy Reno with ECN yet (we figured legacy Reno without
> ECN is a lot more prevalent, so focused on this first). Nonetheless,
> Reno-ECN is unlikely to starve, because starvation is about long-running
> behaviour, and once a flow has run for more than a couple of 100ms RTTs, the
> immediate ECN signals should be no different from a smoothed ECN. I suspect
> Reno-ECN might be worse in its short-term dynamics. But remember Reno-ECN is
> likely to be a tiny corner-case.
>
>
>> > The request:
>> > * Much more evaluation to do, but first we want to know:
>> >   - if the idea works, would the IETF have an appetite for tweaking
>> > the definition of ECN so it is merely equivalent to drop in the long
>> > term, but the dynamics need not be equivalent.
>>
>>    There's a good question there; but I don't think we're ready for it.
>
>
> At this stage, even we haven't got many answers. So I'm not asking the IETF
> to answer the question right now. I'm merely saying, /if/ our idea works, is
> there at least an /appetite/ in the IETF for reconsidering the definition of
> ECN?

I wouldn't mind it, but frankly my own concern was addressing the
security issue, not the current definition.

> We wanted to make the IETF aware of this research early, because it might
> want to at least hold off on any actions that would otherwise close off this
> option.


>
> And if we find that any change is completely out of the question, we have to
> try a different tack (e.g. ECT(1)).
>
>
>>    I'd really like to discuss the dynamics of responding more quickly
>> but perhaps less drastically for almost any real-time flow.
>>
>>    But proving "equivalence in the long term" seems too hard.
>
>
> This should be the easy part, because the longer that conditions are stable,
> a smoothed signal should tend towards an unsmoothed signal, all other
> factors being equal.
>
> Equivalence during dynamics is the hard part, and I'm suggesting we don't
> sweat too much about that, as long as the performance evaluations are not
> too far apart.
>
>
>> > Much better than the ECN that didn't get deployed
>> > * This is Explicit and Immediate Congestion Notification (EICN?)
>> >   - same wire protocol, much greater benefits
>> > * The advantage of the original ECN (avoiding congestive loss) was
>> > too small to be worth the deployment hassle
>>
>>    Actually, I don't agree that was the problem -- instead I believe
>> the code has been deployed but administratively suppressed because
>> the operators don't trust the transports. There _is_ a significant
>> improvement from one-RTT reaction instead of several (to detect a
>> drop), but the whole process is just too complicated, while the
>> opportunity for abuse remains obvious.
>
>
> I agree. That's the 'deployment hassle' side of my sentence - the extra
> trust-enhancing mechanisms that seemed necessary were too much pain for the
> small gain.

I

>
>
>> > * Predictable ultra-low latency without loss too (similar to
>> > DCTCP-ECN) would be worth deploying
>>
>>    I'm optimistic that latency will become an easier argument.
>>
>> > * But we all thought DCTCP could only be deployed in isolation (e.g.
>> > data centres)
>> >   - we all thought DCTCP traffic would starve alongside today's TCP
>> > traffic
>> >   - because in a DCTCP queue, the ECN threshold is lower than you
>> > would trigger drop
>> >   - and we thought ECN & drop had to be equivalent.

I'm looking forward to being convinced otherwise.

>>    (I'm not sure we'll succeed at breaking that "equivalence"...)
>>
>> > * We believe we've found a way to ensure DCTCP-ECN traffic doesn't
>> > starve
>> >   - we still make DCTCP-ECN equivalent to drop in the long-run, but
>> > not in its dynamics
>>
>>    (I'm still not sure it's worth arguing the "long-run".)
>
>
> I mean competing long-running ECN & non-ECN flows stabilise at predictable
> rates, rather than one ratchetting itself down to nothing over time
> (starvation).
>
> That's the primary concern of congestion control 'fairness', before anyone
> starts worrying about what the relative rates are. Given apps get different
> relative rates with different RTTs, with different size objects or by
> opening multiple flows, we don't need to sweat so much about precisely equal
> flow rates; but we must sweat about stable convergence.
>
> Results so far show that the proposed idea is at least very robust against
> starvation.
>
>
> Bob
>
>
>> --
>> John Leslie <[email protected]>
>
>
> ________________________________________________________________
> Bob Briscoe,                                                  BT
> _______________________________________________
> aqm mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/aqm



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to