Dave,

At 22:11 12/12/2013, Dave Taht wrote:
but quickly...

Bob, I object to your characterization of users links being busy 1-3%
of the time. That's an average.

I said it was an average. You're repeating and agreeing with what I said, but saying you object to me saying it?

When they are busy, they are very busy
for short periods, typically 2-16 seconds in the case of web traffic,
then idle for minutes. DASH traffic is busy for 2+ seconds every 10 on
a 20mbit link, and so on, for 1.5 hours or so. Etc.

Yes, again, you're agreeing with me.

The mean for a Web session is towards the low end of the 2-16 seconds range even now. And as we get the other latency-saving advances out there (e.g. removing TCP & TLS handshakes, proper pipelining, and a faster replacement for slow-start without overshoot), then there is potential for Web sessions to drop to 1-2 seconds or less, because they're usually a long way from being bandwidth limited.

You said fq_codel decays out its memory in 800ms. So it will typically have lost all its memory when each Web transfer starts and when each DASH transfer re-starts.

So I think you're agreeing that this 200ms signalling delay will be predominant?

In both cases
congestion exists, and in both cases AQM reaction time measured in
200ms or so is still vastly superior to what happens today, and packet
scheduling masks it to a large extent.

You're saying it's OK to propose a solution that delays signalling congestion for about 10 typical CDN RTTs,... because it's better than nothing?

That's rich. I have to say this... You're one of the group of protagonists who has persuaded the world to embark on a programme of /implementation/ updates that will take years, and rubbished deploying the AQM that was already implemented (RED), even tho it was already much better than nothing too.

Yes, auto-config for line rate is a nice bell to add to the bicycle. If we are going to embark on new implementations, auto-config for RTT is no less important.

Lack of auto-config for either requires the fixed config to be at the slow end of the range. And both have a similar range of variability (the RTT range is actually wider). So lack of auto-config in either case leads to a similar order of unnecessary delay. Particularly given traffic is predominantly in short sparse bursts.



Bob



Jim, yes, I was trying to establish the groundwork for ensuring
everyone really understood codel-by-itself before talking about
fq_codel. I'm still not sure that's been established. Anyone here care
to calculate the number of drops on two flows going through codel and
fq_codel, starting with iw10, over a 10mbit 2ms RTT link? And when,
approximately, the queue becomes ideal? And how often the fq_codel
"fast queue" gets used in this case? Gold stars for everyone that gets
it right.

Jim, I'd like you to use larger download speeds than a mbit for your
examples. somewhere between 8 and 20mbit seems appropriate. (IMHO iw10
should not be used on the modern internet on sub 10Mbit links.) The
dynamics change significantly as you get more bandwidth than iw10
abuses.

Anyway…

After I got as far as describing fq_codel accurately in this thread,
then I'd hoped to be able to tackle the immediate ECN issue, the value
of randomness in pie, and the effectiveness and need for ECN on the
edge as it is currently defined.

and I figured that then I might have written enough to get closer to an rfc.

and when I started kibitzing on this thread I thought I was talking to
the DCTCP case which I've spent a few months studying up on, and
looking over alternative ideas there, like

http://conferences.sigcomm.org/co-next/2013/program/p49.pdf
http://conferences.sigcomm.org/co-next/2013/program/p151.pdf

Scalable, Optimal Flow Routing in Datacenters via Local Link Balancing

http://www.irt-systemx.fr/wp-content/uploads/2013/12/AINTEC.ppt

Sigh. I'll

On Thu, Dec 12, 2013 at 11:35 AM, Jim Gettys <[email protected]> wrote:
>
>
>
> On Wed, Dec 11, 2013 at 2:21 PM, Bob Briscoe <[email protected]> wrote:
>>
>> Jim,
>>
>>
>> At 16:55 11/12/2013, Jim Gettys wrote:
>>
>>
>>
>> On Tue, Dec 10, 2013 at 10:04 PM, Bob Briscoe <[email protected]> wrote:
>> Jim,
>>
>> I'm just checking we're not talking past each other. I'll repeat two
>> quotes from each of us, then comment.
>>
>> On Thu, Dec 5, 2013 at 1:13 PM, Bob Briscoe <[email protected]> wrote:
>>
>> 3{New}. It SHOULD be possible to make different instances of an AQM
>> algorithm apply to different subsets of packets that share the same queue.
>> It SHOULD be possible to classify packets into these subsets at least by ECN
>> codepoint [RFC3168] and Diffserv codepoint [RFC2474] (or the equivalent of
>> these fields at lower layers),
>>
>>
>> At 19:50 05/12/2013, Jim Gettys wrote:
>>
>> "Certainly, it may be the same instance of an AQ
>> M algorithm, rather than different instances, for example."
>>
>>
>> That's true of course, but the case with one AQM handling all packets
>> within a queue is the norm. I want to check you're happy with the converse:
>> 1) A set-up more like WRED which was based on Dave Clark's RIO (RED with
>> in and out of contract). So we can have WPIE, WCoDel etc where the
>> differentiation between aggregates is provided by different AQM instances in
>> the same queue, not by different queues with different scheduling
>> priorities.
>> 2) Extending this so that AQM differentiation can be between ECN-capable
>> and Not-ECN-capable aggregates, not just between Diffserv classes (an
>> example being CoDel with a lower 'interval' for ECN-capable packets).
>>
>> I presented the evaluations of this last idea in tsvwg on the final Friday
>> of the Vancouver IETF - I don't think you were there. <
>> http://www.ietf.org/proceedings/88/slides/slides-88-tsvwg-20.pdf >
>>
>>
>> Yes, unfortunately I had to leave before the Friday session.
>> This is my primary motivation for this wordsmithing - I'm trying allow us
>> to move towards zero signalling delays in CoDel, PIE and RED (currently
>> defaults of 200ms, 100ms and 512packets respectively, which are not good for
>> dynamics).
>>
>>
>> Certainly signalling delays are very important: this is why I'm favorably
>> inclined to "head mark/drop", as it signals TCP as quickly as possible,
>> keeping the response of the TCP feedback loop as tight as possible (and part
>> of why I like CoDel so much for the highly variable bandwidth problem we
>> face at the edge of the net).
>>
>> It's *really* important than when the bandwidth drops suddenly that
>> everyone gets told to slow down quickly (exactly how quickly probably
>> depends on the propagation change characteristics of the medium), or packets
>> can pile up in a big way.
>>
>> How quickly the mark/drop algorithm can figure out that signalling is
>> appropriate is the *other* piece of getting good dynamics.  Here I don't
>> doubt that something may be discovered that is better than CoDel in the
>> slightest.
>> It takes a CoDel instance (within an fq structure) 200ms from its queue
>> first passing 'threshold' before it will ever drop the first packet (unless
>> the queue hits taildrop before that). So if the RTT is 20ms, that's 220ms
>> signalling delay.
>
>
> No, again, see Dave's mail, and you are missing the flow scheduling aspect
> of this and thinking in terms of a single queue and the usual mark/drop
> cases; this is the exact cognitive problem I'm talking about.
>
> The flow scheduling aspect of fq_codel is *more* important than what
> mark/drop algorithm decides to signal the TCP's to regulate their servo
> systems.  I really wish this algorithm had been called "fs_codel", rather
> than "fq_codel", as it is so easy to confuse "fair" with "flow", and "queue"
> with "scheduling".
>
> Regulating TCP is absolutely essential to keep TCP "sane" and responsive,
> but it isn't the most important part of what is going on.
>
> Here's the case of a new TCP flow on a previously idle link: After the TCP
> open handshake, you will have no more than 4 or (if IW10 is active) packets
> for that flow in the queue (actually, 3 packets, as presumably at least one
> packet is in process of transmission; @ 1Mbps, that first full size packet
> takes 13ms).
>
> If another flow starts, it will get preferentially scheduled to the existing
> flow(s) until it has also built a queue, at which time it competes "fairly"
> with the other flows in the packet scheduling.
> This means, in the simple low bandwidth case, that as soon as you start the
> second flow, it gets the next possible opportunity to be transmitted in
> preference to any flow that has built a queue.
>
> Here's the kicker:
>
> We've just avoid 3 packets worth of "head of line blocking" latency for the
> second flow to "do its thing".  @1Mbs, that is 40ms (3*13ms) saved just in
> the TCP open, and more in that the first packet from the second flow then
> gets scheduled immediately, getting that TCP moving.
>
> Similarly for your voip packet; it saves those 40ms. And your bulk flow gets
> its first packet through ASAP as well; for web traffic, that usually
> contains the size information or other metadata required for the web browser
> to unblock.  And that is for a *single* competing TCP connection.
>
> Now, let's look at today's web: there are many embedded objects in a page.
> Once the base page is downloaded, the web browser (presuming its DNS lookups
> are already cached), opens a pile of connections all at once.  Whether I
> like it or not, (which I don't, as I tried to prevent it with pipelining in
> HTTP/1.1 in the 1990's), you can easily have 5-30 TCP connections start
> almost simultaneously. See the connections/page plot on my web page
> explaining all this stuff in more detail at:
> http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/
>
> Without any TCP flow control ever happening, I can easily get 40 (or even
> several hundred packets!) arriving at near line rate (the initial window *
> number of embedded objects). Most of these TCP connections *won't ever even
> get out of slow start*; the objects are typically pretty small.  I've
> observed transient latency from HOL blocking of > 100ms on 50Mbps cable
> service, on some web pages.
>
> Nothing we will/can do at the TCP level can help this situation!!! Nothing,
> other than in the long term getting the incentives right so that fewer TCP
> connections might be preferable to gaming TCP, as the web already has.
>
> That CoDel may take a while to figure out that it should start marking in
> the idle case is really pretty irrelevant; the packets have already arrived
> and unless we do better packet scheduling, you are fully stuck.  Dave's
> trying to also explain that a number of people's understanding of how CoDel
> works has been wrong.
>
> But as I say, *the details of the mark/drop algorithm doesn't matter*.
> Having 3, or 100 packets queued in a FIFO queue will already mean you are
> screwed for anything low latency at the bandwidths we all care about.
>
> Once a flow drains to zero, it again gets treated as a new flow.
> And what we choose to define a "flow" to be is arbitrary, though the code
> today does the usual 5tuple.
>
>> In fq_codel this creates considerable self-delay for short flows or r-t
>> apps, which kill their own latency before they get any loss signal to tell
>> them to slow down.
>
>
> Not likely in the common case.  See Dave's comments and also note that the
> initial burst of web page load runs the queue up high immediately.
>
> And as I keep saying, and will say again, the flow queuing decisions
> avoiding HOL blocking explained are much more important to dealing with the
> latency problems.
>
> So go invent better algorithms that CoDel for drop/mark, that can be applied
> to the flow queuing parts of the algorithm, and I'm happy.  Running code
> please.
>
>
>
>>
>> Even for elastic flows, with congestion signals delayed by so much, they
>> risk hitting themselves with a huge train of overshoot loss. This would be
>> the same for fq_pie, except the number is 100ms + RTT.
>
>
> When fq_pie exists to test, I'll be happy to see.
>
> I keep saying, and I'll say again: while some mark/drop algorithm needs to
> exist, flow scheduling is more important to getting good latency....
>
>
>>
>>
>> Yes, the e2e transport could measure delay growth, but it doesn't know
>> whether the delay is coming from a queue that is isolated from others or
>> not. So it doesn't want to slow down too quickly in response to delay growth >> in case it gets screwed by other traffic. Ie. using delay growth as a signal
>> entails considerable signalling delay due to all the uncertainty.
>>
>> The proposal you missed in tsvwg was to define ECN as an immediate signal
>> from the network, 'interval'=0 in CoDel terms, so the host always gets
>> congestion signals as fast as possible, and if it needs bursts of signals
>> smoothed out, it can do that itself.
>
>
> Yes, I get that.  I got that a year or more ago. The idea has potential
> merit. ECN as it is currently defined is not so useful.
>
> And I wish it would get a different name so we could more easily talk about
> the two different things now being called ECN.
>
> I can see having ECN marks on a burst of packets may be helpful in having
> the receiver judge things in a highly variable wireless scenario; it may
> have additional information about the medium and know that that transient
> has gone away, and that it may not be a wise idea to slow the connection at
> all.
>
>
>>
>> The suggested wording ensures all AQM implementations will allow
>> operators, vendors and users to configure such a mechanism. But I've
>> generalised it from ECN to Diffserv too (because the implementation would be
>> no different).
>
>
> As noted before, Diffserv is still interesting, even though the packet
> scheduling in fq_codel (or similar algorithms) makes it much less necessary.
> There are two aspects to this:
>
> 1) higher priority contention to the medium.  If I have a real VOIP packet,
> there are ways I can ask for higher priority access to the medium and reduce
> the total number of transmit opportunities my traffic requires (and that is
> often the scarcest resource on WiFi or Docsis).
> 2) any hint I can get helps (at the edge) so I can distinguish those packets
> from the way the web has been gaming the network.
>
> Even so, for Diffserv to be safely trusted and honored even in your home,
> the end user (who is the network operator in this case), will have be able
> to know that a device or application is using it and control whether or not
> it's honored. Unless it is under the network operator's control (in this
> case, you the home user) Diffserv can/will get gamed to uselessness by
> application and device manufacturers. Ergo why Dave and I hack on home
> routers again: this level of control is not currently present in these
> devices, and must be for Diffserv to be useful.
>
>>
>>
>>
>> My basic issue is one of terminology: people have talked about "best
>> effort" queues.  In reality, this is a "class" of service, rather than a
>> single queue, and when you get into the mental model of BE being a single
>> queue, (rather than a set of queues) it can lead one astray quickly and
>> easily.
>>
>>
>> Yeah, I know this. I suspected we were talking past each other.
>>
>> I need you to allow the other case into your mind for this conversation.
>> The wording is specifically about the case where "different subsets of
>> packets ... share the same queue".
>
>
> And the word "queue" in most people's minds implies ordering, and FIFO
> behavior. This is the terminological issue I'm harping on. It's also why I
> think "bufferbloat" is a better term for our situation than "queue bloat",
> which you liked and have harped at me about.  Buffers don't have such an
> implication of ordering.
> So if you talked about a buffer here, rather than a queue, I'd be a lot
> happier. At least in my mind, queues are ordered.
>
>>
>> We can talk about an fq structure for this another time, but it's a really
>> complicated way of doing it.
>>
>> Given simple looks like it could work, why get complicated already?
>
>
> Because the flow scheduling is such a win.  You can't solve the whole
> problem just with mark/drop algorithms and FIFO queues and get reliably to
> decent latencies.
>
> Now, whether Fred's document can/should go into anything like that that
> detail is far from clear (arguably not).
>
> I just don't want to further the mythology we can get to decent latencies at
> the edge of the network to continue with FIFO queues and an AQM algorithm,
> and it's clear that many haven't yet internalized that its flow scheduling
> combined with a self tuning adaptive mark/drop algorithm that we must have,
> but that the flow scheduling makes the biggest difference.
>
>>
>>
>> It's really easy to fall into the idea of a single software queue mapping
>> to some single hardware supported queue, and that's a cognitive mistake, as
>> aggregating MACs are showing us; transmit ops are often the scarcest
>> resource...
>>
>>
>> It's only a cognitive mistake if one is not aware of all the options. I'm
>> fully aware of all the options.
>>
>> To be specific, a queue into a wireless medium should be configured so it
>> holds some 'good queue' in reserve for transmit ops, but the queues on top
>> of this that TCP self-inflicts even briefly are not 'good queues' even if
>> they are isolated from other flows by fq - VJ was wrong to generalise the
>> phrase 'good queue' to all bursts of queue - it is only necessary to hold
>> back from signalling and allow a burst of queue if the only possible signal
>> is a drop. With ECN, you don't have this dilemma. This is the key to rapid
>> dynamics.
>
>
> But you also then have to solve the HOL blocking problem, and do so
> urgently.  Ergo flow scheduling.... To get to where we need to go, you have
> to worry about the order in which each packet is scheduled.
>
>>
>>
>> Diffserv marking has the potential to give a "hint" to distinguish how
>> particular flows should be handled (scheduled) in a service class, and as my
>> previous example shows, that hint may be very useful in channel access
>> decisions (e.g. voip on 802.11).
>
>
> ECN doesn't help a bit with the head of line blocking problem; it actually
> will make it worse with FIFO scheduling. ECN means that you can't even get
> the packets out of the way.
>
> Which says you'd better be doing more clever scheduling than a FIFO.
>
> If you want ECN to be usable in the way you want it to be at low bandwidths,
> you better become a fan of flow scheduling...
>>
>>
>> But fq_codel teaches the lesson that packet scheduling combined with
>> keeping TCP sane is a key improvement over handling either problem apart...
>> In particular, the first packets of new flows/reappearing flows are vastly
>> more "important" than other packets in terms of the latency cost to users of >> that service. Each flow has in essence its own queue in this service class,
>> and we're using information from that to help schedule the packets in ways
>> that minimize latency to the user.
>>
>>
>> I know all this. Please can we keep to the conversation about how to avoid
>> the 200ms signalling delay that fq_codel inflicts on each flow (and the
>> similar signalling delays that other AQMs inflict).
>
>
> As I said, it's not the big problem we have today at the edge of the
> network, and Dave's mail explains your model of CoDel isn't so correct.
>
> It's only very long lived flows where the signalling even matters, and
> that's not what we get at the edge of today's network; instead, we get a mix
> of a few larger (e.g. video player flows) with the DOS attack that is web
> traffic, with some isochronous VOIP and teleconferencing traffic.
>>
>>
>>
>>
>> So in this case, a single algorithm is acting over a bunch of flows in a
>> single class of service, and both scheduling packets among the flows, and
>> signalling TCP flows appropriately when they should "slow down".
>>
>>
>> Yup, I know this.
>>
>>
>>
>> So I think you and I are on close to the same page (but have been burned
>> badly in the past by terminology issues getting in the way). On HTTP/1.1 we
>> wasted probably > 2 years talking past each other because we didn't have
>> clear and concise terminology that we all understood the same way.
>>
>>
>> As I thought, we are talking past each other.
>
>
> Yes, in part because I think few have internalized what the web has done to
> edge of the network.  It isn't what I had hoped it would be when I was
> HTTP/1.1 editor.  Why this outcome occurred is too long a discussion for
> this thread.
>
>
>> We need to be able to have a conversation that is not always "Hmm, that's
>> sounds like it might be interesting. Can I tell you about fq_codel now?"
>
>
> Running code of other algorithms very welcome. Fq_codel is running code.
> Pie is running code.
>
> Maybe fq_pie will be running code we can test someday.
>
> Even then, I want a low target latency so individual TCP's are kept
> responsive, and I need an algorithm that can keep up quickly with the
> dynamics of wireless, so most simplistic tests will not be useful (we don't
> have such good evaluation tests today).
>
> When it is, and if it is better than fq_codel, I'll be happy.  But the
> mark/drop part of the algorithm *isn't* the most important part of the
> algorithm. The packet scheduling decisions are....
>                                - Jim
>
>>
>>
>>
>>
>> Bob
>>
>>
>>
>> And I don't claim I have the right terminology for all this stuff, either
>> (even in this mail).
>>
>> Which is why I was loathe to suggest exact text...
>>                            - Jim
>>
>>
>>
>> At 19:50 05/12/2013, Jim Gettys wrote:
>>
>>
>>
>> On Thu, Dec 5, 2013 at 1:13 PM, Bob Briscoe <[email protected]> wrote:
>> Fred, Gorry, all,
>> I promised to suggest text for draft-ietf-aqm-recommendation about
>> allowing the AQM's behaviour to be independent for ECN and non-ECN packets.
>> In the process, I realised we can't talk about independent AQMs for ECN
>> without also including Diffserv.
>> This gets messy, because I believe a good AQM for BE traffic with and
>> without ECN, should remove much if not all the need for Diffserv. But we
>> can't ignore Diffserv.
>>
>>
>> I agree in principle with what Bob is trying to say here (and is very much
>> what I've been saying in my blog entry of last summer).
>>
>> Once you have things under control, the need for Diffserv diminishes
>> dramatically (but does not go away).
>>
>> But as Bob notes, there is still a good use for Diffserv: suitably marked
>> traffic may want to contend for access to the channel differently: your
>> marked VOIP packets may want to change the priority with which you request
>> channel access, so that you get more timely access to the medium. This
>> conserves transmit opportunities, which is often the scarcest resource in
>> many systems (e.g. 802.11, DOCSIS, etc.). This can be the difference between
>> your VOIP working well, and not working well, on a busy 802.11 network as
>> well as using the channel as efficiently as possible.
>>
>> Similarly, if you have packets you know are background, it's helpful to
>> know that to ensure that they never contend for access to the medium but
>> will always defer to other traffic, and just scavenge available space in
>> other transmit opportunities where possible.
>>
>> I'm a bit loathe though to tie the behavior to queues, however; in
>> particular, best effort traffic may want to be sent in the same aggregate as
>> higher (or lower) priority traffic, if there is remaining space in the
>> aggregate.
>>
>> In short, the mental model we've had that there is a one-to-one model of
>> hardware and software queues (not to mention flows in a given software
>> queue) is often incorrect (or at least seriously sub-optimal) in today's
>> systems (even if the hardware queues "work" properly, which it appears they
>> do not in 802.11).
>>
>> So I'm not sure Bob's new section 3 here is how to best to state this (or
>> to deal with the terminology problem).  Certainly, it may be the same
>> instance of an AQM algorithm, rather than different instances, for example.
>> And "
>> It SHOULD be possible" is more a pious wish than anything else.  But I
>> agree in spirit with what Bob's trying to say.
>>                                - Jim
>>
>>
>> _________________________________________________________________________________________
>> {In Section 4: add another bullet between recommendations 2 & 3:}
>> 3{New}. It SHOULD be possible to make different instances of an AQM
>> algorithm apply to different subsets of packets that share the same queue.
>> It SHOULD be possible to classify packets into these subsets at least by ECN
>> codepoint [RFC3168] and Diffserv codepoint [RFC2474] (or the equivalent of
>> these fields at lower layers).
>> {Then a new section to expand on this before the current Section 4.3.}
>> 4.3{New}. Independent AQM Instances for ECN and Diffserv
>> The recommendation to provide a separate instance of the AQM for ECN
>> packets goes beyond the assumptions of RFC 3168, which assumed that only one >> instance of an AQM will handle both ECN-capable and non-ECN-capable packets.
>>
>>
>> Bob
>>
>>
>> ________________________________________________________________ Bob
>> Briscoe,                                                  BT
>> _______________________________________________ aqm mailing list
>> [email protected] https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>> _______________________________________________
>> aqm mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>> ________________________________________________________________
>> Bob Briscoe,                                                  BT
>>
>> _______________________________________________
>> aqm mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/aqm
>>
>>
>> _______________________________________________
>> aqm mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/aqm
>>
>> ________________________________________________________________
>> Bob Briscoe,                                                  BT
>>
>>
>> _______________________________________________
>> aqm mailing list
>> [email protected]
>> https://www.ietf.org/mailman/listinfo/aqm
>>
> I can imagine many other possible algorithms that CoDel for the mark/drop
> algorithm; we happen to like CoDel atm for two reasons: 1) it is self
> adapting to the line rate, and 2) head mark/drop signals the TCP's as soon
> as a decision is made rather than possibly being applied much later (such as
> random or tail drop).  We welcome and encourage further essays in the art.
>
>
> _______________________________________________
> aqm mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/aqm
>



--
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

________________________________________________________________
Bob Briscoe, BT
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to