On Fri, Dec 13, 2013 at 12:15 AM, Dave Taht <[email protected]> wrote:
> For starters, the codel signaling delay from the onset of continuous
> over 5ms delay on packets defaults to target 100ms, not 200ms.

interval 100ms.

it has been a long week.

> I don't
> know who started saying 200ms but even I started believing it with the
> few brain cells I've had to spare of late. 5x a CDN rtt in a world of
> 30-60k images sounds about right.
>
> Secondly, codel drops/marks from head, not from tail, so the signal
> gets back to the sender in 1/2  the real physical RTT after that,
> rather than the tail of a queue that may be out of control at that
> point. Much faster than pie.
>
> There has been so much misinformation spread of late on these threads.
> I'm hoping we're beginning to make a dent in it? I look forward to
> making all this clear on the upcoming RFCs. I think I should stop now,
> revisit the rest of this thread and see what else can be cleared up
> before even beginning to tackle fq_codel after I get caught up on
> sleep.
>
> as for your other comments...
>
> I have always said deploy RED and for that matter DRR, SFQ or SQF
> where you can. I distinctly remember polling the crowd at the first
> uknof I went to and being sad to discover only about 4% of the room
> had (4 people).
>
> I DO hold that red is too hard to configure for ordinary mortals, and
> that it doesn't work at all on variable bandwidth links like cable, or
> wireless, which happen to be the dominant form of end-user link
> nowadays.
>
> As for the hysteresis "problem", in practice it doesn't seem to be
> much of a problem. things get well under control before a web page
> completes.Same goes for my tests against DASH traffic. I have plenty
> of plots and traces of this. Many are on the results webpage for
> bufferbloat.net.
>
> as for a good default for interval, a good number IS dependent on your
> RTT, and without coupling the ingress and output queues, it's
> difficult to determine or even auto tune that. Perhaps with connection
> tracking or some other form of coupling, one day. The ACC code from
> the gargoyle router project is worth looking at.
>
> I am satisfied that fq_codel can be deployed on fixed rate lines
> without any tuning on bandwidths ranging from 4mbit to 1gbit, today,
> as it stands. I have done hundreds of thousands of tests to prove
> that. Optimizations are helpful for the 3 band system that is what
> mostly deployed today, such as smaller quantum's on slow asymmetric
> links, and a smaller packet limit on low memory routers.
>
> A larger target is working well on sub 4ms links. I think that could
> auto tune better.

s/4ms/4mbit/

>
> A lower target and interval seem right for data center use, but I have
> yet to get anyone to run my suite of published tests.
>
> A rate limiter is required to compensate for ISP's lousy
> dslam/cmts/gpon head end and CPE at least until this code makes it
> onto those devices. Long lead times predominate on this sort of
> hardware - We have three years to get DOCSIS 3.1 right, as one
> example.
>
> These are second order problems that will be fixed over time. Wifi and
> wireless remain problematic, but dents in those problems seem imminent
> by next year, and many of the problems aren't aqm or packet scheduling
> ones.
>
> SO. damn straight, I'm one of the people pushing for deployment,
> notably on boxes that are easy to upgrade and fix as we learn more
> about what we should be doing. I'm definitely reluctant to hard code
> stuff into big iron or hard to replace firmware as yet. But as matt
> mathis said at ietf - what we have is such an improvement over what is
> in place today, that it is time to deploy. After almost 3 years of
> effort I'm happy to have a few million boxes in place to learn more
> from. Aren't you?
>
> We just have a couple billion boxes left to fix. Plenty of time to
> tweak things as we go along. If you want RED, or ARED, in linux, it's
> been fixed now for 2 years to perform as to the spec. Go for it. If
> you could create something to automate RED configuration as I have for
> the ceroshaper tool in cerowrt, let me know.
>
> Any time someone has debloating code worth working on... I'm willing
> to help. I've been helping on pie, and as you know I've been looking
> over your DCTCP experiment carefully, finding and fixing bugs, and
> moving the code forward to where it can be compared against a modern
> kernel and a modern TCP and modern AQM and packet scheduling systems.
>
>
>
>
> On Thu, Dec 12, 2013 at 4:05 PM, Bob Briscoe <[email protected]> wrote:
>> Dave,
>>
>>
>> At 22:11 12/12/2013, Dave Taht wrote:
>>>
>>> but quickly...
>>>
>>> Bob, I object to your characterization of users links being busy 1-3%
>>> of the time. That's an average.
>>
>>
>> I said it was an average. You're repeating and agreeing with what I said,
>> but saying you object to me saying it?
>>
>>
>>> When they are busy, they are very busy
>>> for short periods, typically 2-16 seconds in the case of web traffic,
>>> then idle for minutes. DASH traffic is busy for 2+ seconds every 10 on
>>> a 20mbit link, and so on, for 1.5 hours or so. Etc.
>>
>>
>> Yes, again, you're agreeing with me.
>>
>> The mean for a Web session is towards the low end of the 2-16 seconds range
>> even now. And as we get the other latency-saving advances out there (e.g.
>> removing TCP & TLS handshakes, proper pipelining, and a faster replacement
>> for slow-start without overshoot), then there is potential for Web sessions
>> to drop to 1-2 seconds or less, because they're usually a long way from
>> being bandwidth limited.
>>
>> You said fq_codel decays out its memory in 800ms. So it will typically have
>> lost all its memory when each Web transfer starts and when each DASH
>> transfer re-starts.
>>
>> So I think you're agreeing that this 200ms signalling delay will be
>> predominant?
>>
>>
>>> In both cases
>>> congestion exists, and in both cases AQM reaction time measured in
>>> 200ms or so is still vastly superior to what happens today, and packet
>>> scheduling masks it to a large extent.
>>
>>
>> You're saying it's OK to propose a solution that delays signalling
>> congestion for about 10 typical CDN RTTs,... because it's better than
>> nothing?
>>
>> That's rich. I have to say this... You're one of the group of protagonists
>> who has persuaded the world to embark on a programme of /implementation/
>> updates that will take years, and rubbished deploying the AQM that was
>> already implemented (RED), even tho it was already much better than nothing
>> too.
>>
>> Yes, auto-config for line rate is a nice bell to add to the bicycle. If we
>> are going to embark on new implementations, auto-config for RTT is no less
>> important.
>>
>> Lack of auto-config for either requires the fixed config to be at the slow
>> end of the range. And both have a similar range of variability (the RTT
>> range is actually wider). So lack of auto-config in either case leads to a
>> similar order of unnecessary delay. Particularly given traffic is
>> predominantly in short sparse bursts.
>>
>>
>>
>> Bob
>>
>>
>>
>>
>>> Jim, yes, I was trying to establish the groundwork for ensuring
>>> everyone really understood codel-by-itself before talking about
>>> fq_codel. I'm still not sure that's been established. Anyone here care
>>> to calculate the number of drops on two flows going through codel and
>>> fq_codel, starting with iw10, over a 10mbit 2ms RTT link? And when,
>>> approximately, the queue becomes ideal? And how often the fq_codel
>>> "fast queue" gets used in this case? Gold stars for everyone that gets
>>> it right.
>>>
>>> Jim, I'd like you to use larger download speeds than a mbit for your
>>> examples. somewhere between 8 and 20mbit seems appropriate. (IMHO iw10
>>> should not be used on the modern internet on sub 10Mbit links.) The
>>> dynamics change significantly as you get more bandwidth than iw10
>>> abuses.
>>>
>>> Anyway…
>>>
>>> After I got as far as describing fq_codel accurately in this thread,
>>> then I'd hoped to be able to tackle the immediate ECN issue, the value
>>> of randomness in pie, and the effectiveness and need for ECN on the
>>> edge as it is currently defined.
>>>
>>> and I figured that then I might have written enough to get closer to an
>>> rfc.
>>>
>>> and when I started kibitzing on this thread I thought I was talking to
>>> the DCTCP case which I've spent a few months studying up on, and
>>> looking over alternative ideas there, like
>>>
>>> http://conferences.sigcomm.org/co-next/2013/program/p49.pdf
>>> http://conferences.sigcomm.org/co-next/2013/program/p151.pdf
>>>
>>> Scalable, Optimal Flow Routing in Datacenters via Local Link Balancing
>>>
>>> http://www.irt-systemx.fr/wp-content/uploads/2013/12/AINTEC.ppt
>>>
>>> Sigh. I'll
>>>
>>> On Thu, Dec 12, 2013 at 11:35 AM, Jim Gettys <[email protected]> wrote:
>>> >
>>> >
>>> >
>>> > On Wed, Dec 11, 2013 at 2:21 PM, Bob Briscoe <[email protected]> wrote:
>>> >>
>>> >> Jim,
>>> >>
>>> >>
>>> >> At 16:55 11/12/2013, Jim Gettys wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Dec 10, 2013 at 10:04 PM, Bob Briscoe <[email protected]>
>>> >> wrote:
>>> >> Jim,
>>> >>
>>> >> I'm just checking we're not talking past each other. I'll repeat two
>>> >> quotes from each of us, then comment.
>>> >>
>>> >> On Thu, Dec 5, 2013 at 1:13 PM, Bob Briscoe <[email protected]> wrote:
>>> >>
>>> >> 3{New}. It SHOULD be possible to make different instances of an AQM
>>> >> algorithm apply to different subsets of packets that share the same
>>> >> queue.
>>> >> It SHOULD be possible to classify packets into these subsets at least
>>> >> by ECN
>>> >> codepoint [RFC3168] and Diffserv codepoint [RFC2474] (or the equivalent
>>> >> of
>>> >> these fields at lower layers),
>>> >>
>>> >>
>>> >> At 19:50 05/12/2013, Jim Gettys wrote:
>>> >>
>>> >> "Certainly, it may be the same instance of an AQ
>>> >> M algorithm, rather than different instances, for example."
>>> >>
>>> >>
>>> >> That's true of course, but the case with one AQM handling all packets
>>> >> within a queue is the norm. I want to check you're happy with the
>>> >> converse:
>>> >> 1) A set-up more like WRED which was based on Dave Clark's RIO (RED
>>> >> with
>>> >> in and out of contract). So we can have WPIE, WCoDel etc where the
>>> >> differentiation between aggregates is provided by different AQM
>>> >> instances in
>>> >> the same queue, not by different queues with different scheduling
>>> >> priorities.
>>> >> 2) Extending this so that AQM differentiation can be between
>>> >> ECN-capable
>>> >> and Not-ECN-capable aggregates, not just between Diffserv classes (an
>>> >> example being CoDel with a lower 'interval' for ECN-capable packets).
>>> >>
>>> >> I presented the evaluations of this last idea in tsvwg on the final
>>> >> Friday
>>> >> of the Vancouver IETF - I don't think you were there. <
>>> >> http://www.ietf.org/proceedings/88/slides/slides-88-tsvwg-20.pdf >
>>> >>
>>> >>
>>> >> Yes, unfortunately I had to leave before the Friday session.
>>> >> This is my primary motivation for this wordsmithing - I'm trying allow
>>> >> us
>>> >> to move towards zero signalling delays in CoDel, PIE and RED (currently
>>> >> defaults of 200ms, 100ms and 512packets respectively, which are not
>>> >> good for
>>> >> dynamics).
>>> >>
>>> >>
>>> >> Certainly signalling delays are very important: this is why I'm
>>> >> favorably
>>> >> inclined to "head mark/drop", as it signals TCP as quickly as possible,
>>> >> keeping the response of the TCP feedback loop as tight as possible (and
>>> >> part
>>> >> of why I like CoDel so much for the highly variable bandwidth problem
>>> >> we
>>> >> face at the edge of the net).
>>> >>
>>> >> It's *really* important than when the bandwidth drops suddenly that
>>> >> everyone gets told to slow down quickly (exactly how quickly probably
>>> >> depends on the propagation change characteristics of the medium), or
>>> >> packets
>>> >> can pile up in a big way.
>>> >>
>>> >> How quickly the mark/drop algorithm can figure out that signalling is
>>> >> appropriate is the *other* piece of getting good dynamics.  Here I
>>> >> don't
>>> >> doubt that something may be discovered that is better than CoDel in the
>>> >> slightest.
>>> >> It takes a CoDel instance (within an fq structure) 200ms from its queue
>>> >> first passing 'threshold' before it will ever drop the first packet
>>> >> (unless
>>> >> the queue hits taildrop before that). So if the RTT is 20ms, that's
>>> >> 220ms
>>> >> signalling delay.
>>> >
>>> >
>>> > No, again, see Dave's mail, and you are missing the flow scheduling
>>> > aspect
>>> > of this and thinking in terms of a single queue and the usual mark/drop
>>> > cases; this is the exact cognitive problem I'm talking about.
>>> >
>>> > The flow scheduling aspect of fq_codel is *more* important than what
>>> > mark/drop algorithm decides to signal the TCP's to regulate their servo
>>> > systems.  I really wish this algorithm had been called "fs_codel",
>>> > rather
>>> > than "fq_codel", as it is so easy to confuse "fair" with "flow", and
>>> > "queue"
>>> > with "scheduling".
>>> >
>>> > Regulating TCP is absolutely essential to keep TCP "sane" and
>>> > responsive,
>>> > but it isn't the most important part of what is going on.
>>> >
>>> > Here's the case of a new TCP flow on a previously idle link: After the
>>> > TCP
>>> > open handshake, you will have no more than 4 or (if IW10 is active)
>>> > packets
>>> > for that flow in the queue (actually, 3 packets, as presumably at least
>>> > one
>>> > packet is in process of transmission; @ 1Mbps, that first full size
>>> > packet
>>> > takes 13ms).
>>> >
>>> > If another flow starts, it will get preferentially scheduled to the
>>> > existing
>>> > flow(s) until it has also built a queue, at which time it competes
>>> > "fairly"
>>> > with the other flows in the packet scheduling.
>>> > This means, in the simple low bandwidth case, that as soon as you start
>>> > the
>>> > second flow, it gets the next possible opportunity to be transmitted in
>>> > preference to any flow that has built a queue.
>>> >
>>> > Here's the kicker:
>>> >
>>> > We've just avoid 3 packets worth of "head of line blocking" latency for
>>> > the
>>> > second flow to "do its thing".  @1Mbs, that is 40ms (3*13ms) saved just
>>> > in
>>> > the TCP open, and more in that the first packet from the second flow
>>> > then
>>> > gets scheduled immediately, getting that TCP moving.
>>> >
>>> > Similarly for your voip packet; it saves those 40ms. And your bulk flow
>>> > gets
>>> > its first packet through ASAP as well; for web traffic, that usually
>>> > contains the size information or other metadata required for the web
>>> > browser
>>> > to unblock.  And that is for a *single* competing TCP connection.
>>> >
>>> > Now, let's look at today's web: there are many embedded objects in a
>>> > page.
>>> > Once the base page is downloaded, the web browser (presuming its DNS
>>> > lookups
>>> > are already cached), opens a pile of connections all at once.  Whether I
>>> > like it or not, (which I don't, as I tried to prevent it with pipelining
>>> > in
>>> > HTTP/1.1 in the 1990's), you can easily have 5-30 TCP connections start
>>> > almost simultaneously. See the connections/page plot on my web page
>>> > explaining all this stuff in more detail at:
>>> >
>>> > http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/
>>> >
>>> > Without any TCP flow control ever happening, I can easily get 40 (or
>>> > even
>>> > several hundred packets!) arriving at near line rate (the initial window
>>> > *
>>> > number of embedded objects).  Most of these TCP connections *won't ever
>>> > even
>>> > get out of slow start*; the objects are typically pretty small.  I've
>>> > observed transient latency from HOL blocking of > 100ms on 50Mbps cable
>>> > service, on some web pages.
>>> >
>>> > Nothing we will/can do at the TCP level can help this situation!!!
>>> > Nothing,
>>> > other than in the long term getting the incentives right so that fewer
>>> > TCP
>>> > connections might be preferable to gaming TCP, as the web already has.
>>> >
>>> > That CoDel may take a while to figure out that it should start marking
>>> > in
>>> > the idle case is really pretty irrelevant; the packets have already
>>> > arrived
>>> > and unless we do better packet scheduling, you are fully stuck.  Dave's
>>> > trying to also explain that a number of people's understanding of how
>>> > CoDel
>>> > works has been wrong.
>>> >
>>> > But as I say, *the details of the mark/drop algorithm doesn't matter*.
>>> > Having 3, or 100 packets queued in a FIFO queue will already mean you
>>> > are
>>> > screwed for anything low latency at the bandwidths we all care about.
>>> >
>>> > Once a flow drains to zero, it again gets treated as a new flow.
>>> > And what we choose to define a "flow" to be is arbitrary, though the
>>> > code
>>> > today does the usual 5tuple.
>>> >
>>> >> In fq_codel this creates considerable self-delay for short flows or r-t
>>> >> apps, which kill their own latency before they get any loss signal to
>>> >> tell
>>> >> them to slow down.
>>> >
>>> >
>>> > Not likely in the common case.  See Dave's comments and also note that
>>> > the
>>> > initial burst of web page load runs the queue up high immediately.
>>> >
>>> > And as I keep saying, and will say again, the flow queuing decisions
>>> > avoiding HOL blocking explained are much more important to dealing with
>>> > the
>>> > latency problems.
>>> >
>>> > So go invent better algorithms that CoDel for drop/mark, that can be
>>> > applied
>>> > to the flow queuing parts of the algorithm, and I'm happy.  Running code
>>> > please.
>>> >
>>> >
>>> >
>>> >>
>>> >> Even for elastic flows, with congestion signals delayed by so much,
>>> >> they
>>> >> risk hitting themselves with a huge train of overshoot loss. This would
>>> >> be
>>> >> the same for fq_pie, except the number is 100ms + RTT.
>>> >
>>> >
>>> > When fq_pie exists to test, I'll be happy to see.
>>> >
>>> > I keep saying, and I'll say again: while some mark/drop algorithm needs
>>> > to
>>> > exist, flow scheduling is more important to getting good latency....
>>> >
>>> >
>>> >>
>>> >>
>>> >> Yes, the e2e transport could measure delay growth, but it doesn't know
>>> >> whether the delay is coming from a queue that is isolated from others
>>> >> or
>>> >> not. So it doesn't want to slow down too quickly in response to delay
>>> >> growth
>>> >> in case it gets screwed by other traffic. Ie. using delay growth as a
>>> >> signal
>>> >> entails considerable signalling delay due to all the uncertainty.
>>> >>
>>> >> The proposal you missed in tsvwg was to define ECN as an immediate
>>> >> signal
>>> >> from the network, 'interval'=0 in CoDel terms, so the host always gets
>>> >> congestion signals as fast as possible, and if it needs bursts of
>>> >> signals
>>> >> smoothed out, it can do that itself.
>>> >
>>> >
>>> > Yes, I get that.  I got that a year or more ago. The idea has potential
>>> > merit. ECN as it is currently defined is not so useful.
>>> >
>>> > And I wish it would get a different name so we could more easily talk
>>> > about
>>> > the two different things now being called ECN.
>>> >
>>> > I can see having ECN marks on a burst of packets may be helpful in
>>> > having
>>> > the receiver judge things in a highly variable wireless scenario; it may
>>> > have additional information about the medium and know that that
>>> > transient
>>> > has gone away, and that it may not be a wise idea to slow the connection
>>> > at
>>> > all.
>>> >
>>> >
>>> >>
>>> >> The suggested wording ensures all AQM implementations will allow
>>> >> operators, vendors and users to configure such a mechanism. But I've
>>> >> generalised it from ECN to Diffserv too (because the implementation
>>> >> would be
>>> >> no different).
>>> >
>>> >
>>> > As noted before, Diffserv is still interesting, even though the packet
>>> > scheduling in fq_codel (or similar algorithms) makes it much less
>>> > necessary.
>>> > There are two aspects to this:
>>> >
>>> > 1) higher priority contention to the medium.  If I have a real VOIP
>>> > packet,
>>> > there are ways I can ask for higher priority access to the medium and
>>> > reduce
>>> > the total number of transmit opportunities my traffic requires (and that
>>> > is
>>> > often the scarcest resource on WiFi or Docsis).
>>> > 2) any hint I can get helps (at the edge) so I can distinguish those
>>> > packets
>>> > from the way the web has been gaming the network.
>>> >
>>> > Even so, for Diffserv to be safely trusted and honored even in your
>>> > home,
>>> > the end user (who is the network operator in this case), will have be
>>> > able
>>> > to know that a device or application is using it and control whether or
>>> > not
>>> > it's honored. Unless it is under the network operator's control (in this
>>> > case, you the home user) Diffserv can/will get gamed to uselessness by
>>> > application and device manufacturers. Ergo why Dave and I hack on home
>>> > routers again: this level of control is not currently present in these
>>> > devices, and must be for Diffserv to be useful.
>>> >
>>> >>
>>> >>
>>> >>
>>> >> My basic issue is one of terminology: people have talked about "best
>>> >> effort" queues.  In reality, this is a "class" of service, rather than
>>> >> a
>>> >> single queue, and when you get into the mental model of BE being a
>>> >> single
>>> >> queue, (rather than a set of queues) it can lead one astray quickly and
>>> >> easily.
>>> >>
>>> >>
>>> >> Yeah, I know this. I suspected we were talking past each other.
>>> >>
>>> >> I need you to allow the other case into your mind for this
>>> >> conversation.
>>> >> The wording is specifically about the case where "different subsets of
>>> >> packets ... share the same queue".
>>> >
>>> >
>>> > And the word "queue" in most people's minds implies ordering, and FIFO
>>> > behavior.  This is the terminological issue I'm harping on.  It's also
>>> > why I
>>> > think "bufferbloat" is a better term for our situation than "queue
>>> > bloat",
>>> > which you liked and have harped at me about.  Buffers don't have such an
>>> > implication of ordering.
>>> > So if you talked about a buffer here, rather than a queue, I'd be a lot
>>> > happier. At least in my mind, queues are ordered.
>>> >
>>> >>
>>> >> We can talk about an fq structure for this another time, but it's a
>>> >> really
>>> >> complicated way of doing it.
>>> >>
>>> >> Given simple looks like it could work, why get complicated already?
>>> >
>>> >
>>> > Because the flow scheduling is such a win.  You can't solve the whole
>>> > problem just with mark/drop algorithms and FIFO queues and get reliably
>>> > to
>>> > decent latencies.
>>> >
>>> > Now, whether Fred's document can/should go into anything like that that
>>> > detail is far from clear (arguably not).
>>> >
>>> > I just don't want to further the mythology we can get to decent
>>> > latencies at
>>> > the edge of the network to continue with FIFO queues and an AQM
>>> > algorithm,
>>> > and it's clear that many haven't yet internalized that its flow
>>> > scheduling
>>> > combined with a self tuning adaptive mark/drop algorithm that we must
>>> > have,
>>> > but that the flow scheduling makes the biggest difference.
>>> >
>>> >>
>>> >>
>>> >> It's really easy to fall into the idea of a single software queue
>>> >> mapping
>>> >> to some single hardware supported queue, and that's a cognitive
>>> >> mistake, as
>>> >> aggregating MACs are showing us; transmit ops are often the scarcest
>>> >> resource...
>>> >>
>>> >>
>>> >> It's only a cognitive mistake if one is not aware of all the options.
>>> >> I'm
>>> >> fully aware of all the options.
>>> >>
>>> >> To be specific, a queue into a wireless medium should be configured so
>>> >> it
>>> >> holds some 'good queue' in reserve for transmit ops, but the queues on
>>> >> top
>>> >> of this that TCP self-inflicts even briefly are not 'good queues' even
>>> >> if
>>> >> they are isolated from other flows by fq - VJ was wrong to generalise
>>> >> the
>>> >> phrase 'good queue' to all bursts of queue - it is only necessary to
>>> >> hold
>>> >> back from signalling and allow a burst of queue if the only possible
>>> >> signal
>>> >> is a drop. With ECN, you don't have this dilemma. This is the key to
>>> >> rapid
>>> >> dynamics.
>>> >
>>> >
>>> > But you also then have to solve the HOL blocking problem, and do so
>>> > urgently.  Ergo flow scheduling.... To get to where we need to go, you
>>> > have
>>> > to worry about the order in which each packet is scheduled.
>>> >
>>> >>
>>> >>
>>> >> Diffserv marking has the potential to give a "hint" to distinguish how
>>> >> particular flows should be handled (scheduled) in a service class, and
>>> >> as my
>>> >> previous example shows, that hint may be very useful in channel access
>>> >> decisions (e.g. voip on 802.11).
>>> >
>>> >
>>> > ECN doesn't help a bit with the head of line blocking problem; it
>>> > actually
>>> > will make it worse with FIFO scheduling. ECN means that you can't even
>>> > get
>>> > the packets out of the way.
>>> >
>>> > Which says you'd better be doing more clever scheduling than a FIFO.
>>> >
>>> > If you want ECN to be usable in the way you want it to be at low
>>> > bandwidths,
>>> > you better become a fan of flow scheduling...
>>> >>
>>> >>
>>> >> But fq_codel teaches the lesson that packet scheduling combined with
>>> >> keeping TCP sane is a key improvement over handling either problem
>>> >> apart...
>>> >> In particular, the first packets of new flows/reappearing flows are
>>> >> vastly
>>> >> more "important" than other packets in terms of the latency cost to
>>> >> users of
>>> >> that service. Each flow has in essence its own queue in this service
>>> >> class,
>>> >> and we're using information from that to help schedule the packets in
>>> >> ways
>>> >> that minimize latency to the user.
>>> >>
>>> >>
>>> >> I know all this. Please can we keep to the conversation about how to
>>> >> avoid
>>> >> the 200ms signalling delay that fq_codel inflicts on each flow (and the
>>> >> similar signalling delays that other AQMs inflict).
>>> >
>>> >
>>> > As I said, it's not the big problem we have today at the edge of the
>>> > network, and Dave's mail explains your model of CoDel isn't so correct.
>>> >
>>> > It's only very long lived flows where the signalling even matters, and
>>> > that's not what we get at the edge of today's network; instead, we get a
>>> > mix
>>> > of a few larger (e.g. video player flows) with the DOS attack that is
>>> > web
>>> > traffic, with some isochronous VOIP and teleconferencing traffic.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> So in this case, a single algorithm is acting over a bunch of flows in
>>> >> a
>>> >> single class of service, and both scheduling packets among the flows,
>>> >> and
>>> >> signalling TCP flows appropriately when they should "slow down".
>>> >>
>>> >>
>>> >> Yup, I know this.
>>> >>
>>> >>
>>> >>
>>> >> So I think you and I are on close to the same page (but have been
>>> >> burned
>>> >> badly in the past by terminology issues getting in the way).  On
>>> >> HTTP/1.1 we
>>> >> wasted probably > 2 years talking past each other because we didn't
>>> >> have
>>> >> clear and concise terminology that we all understood the same way.
>>> >>
>>> >>
>>> >> As I thought, we are talking past each other.
>>> >
>>> >
>>> > Yes, in part because I think few have internalized what the web has done
>>> > to
>>> > edge of the network.  It isn't what I had hoped it would be when I was
>>> > HTTP/1.1 editor.  Why this outcome occurred is too long a discussion for
>>> > this thread.
>>> >
>>> >
>>> >> We need to be able to have a conversation that is not always "Hmm,
>>> >> that's
>>> >> sounds like it might be interesting. Can I tell you about fq_codel
>>> >> now?"
>>> >
>>> >
>>> > Running code of other algorithms very welcome. Fq_codel is running code.
>>> > Pie is running code.
>>> >
>>> > Maybe fq_pie will be running code we can test someday.
>>> >
>>> > Even then, I want a low target latency so individual TCP's are kept
>>> > responsive, and I need an algorithm that can keep up quickly with the
>>> > dynamics of wireless, so most simplistic tests will not be useful (we
>>> > don't
>>> > have such good evaluation tests today).
>>> >
>>> > When it is, and if it is better than fq_codel, I'll be happy.  But the
>>> > mark/drop part of the algorithm *isn't* the most important part of the
>>> > algorithm. The packet scheduling decisions are....
>>> >                                - Jim
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Bob
>>> >>
>>> >>
>>> >>
>>> >> And I don't claim I have the right terminology for all this stuff,
>>> >> either
>>> >> (even in this mail).
>>> >>
>>> >> Which is why I was loathe to suggest exact text...
>>> >>                            - Jim
>>> >>
>>> >>
>>> >>
>>> >> At 19:50 05/12/2013, Jim Gettys wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Dec 5, 2013 at 1:13 PM, Bob Briscoe <[email protected]> wrote:
>>> >> Fred, Gorry, all,
>>> >> I promised to suggest text for draft-ietf-aqm-recommendation about
>>> >> allowing the AQM's behaviour to be independent for ECN and non-ECN
>>> >> packets.
>>> >> In the process, I realised we can't talk about independent AQMs for ECN
>>> >> without also including Diffserv.
>>> >> This gets messy, because I believe a good AQM for BE traffic with and
>>> >> without ECN, should remove much if not all the need for Diffserv. But
>>> >> we
>>> >> can't ignore Diffserv.
>>> >>
>>> >>
>>> >> I agree in principle with what Bob is trying to say here (and is very
>>> >> much
>>> >> what I've been saying in my blog entry of last summer).
>>> >>
>>> >> Once you have things under control, the need for Diffserv diminishes
>>> >> dramatically (but does not go away).
>>> >>
>>> >> But as Bob notes, there is still a good use for Diffserv: suitably
>>> >> marked
>>> >> traffic may want to contend for access to the channel differently: your
>>> >> marked VOIP packets may want to change the priority with which you
>>> >> request
>>> >> channel access, so that you get more timely access to the medium. This
>>> >> conserves transmit opportunities, which is often the scarcest resource
>>> >> in
>>> >> many systems (e.g. 802.11, DOCSIS, etc.). This can be the difference
>>> >> between
>>> >> your VOIP working well, and not working well, on a busy 802.11 network
>>> >> as
>>> >> well as using the channel as efficiently as possible.
>>> >>
>>> >> Similarly, if you have packets you know are background, it's helpful to
>>> >> know that to ensure that they never contend for access to the medium
>>> >> but
>>> >> will always defer to other traffic, and just scavenge available space
>>> >> in
>>> >> other transmit opportunities where possible.
>>> >>
>>> >> I'm a bit loathe though to tie the behavior to queues, however; in
>>> >> particular, best effort traffic may want to be sent in the same
>>> >> aggregate as
>>> >> higher (or lower) priority traffic, if there is remaining space in the
>>> >> aggregate.
>>> >>
>>> >> In short, the mental model we've had that there is a one-to-one model
>>> >> of
>>> >> hardware and software queues (not to mention flows in a given software
>>> >> queue) is often incorrect (or at least seriously sub-optimal) in
>>> >> today's
>>> >> systems (even if the hardware queues "work" properly, which it appears
>>> >> they
>>> >> do not in 802.11).
>>> >>
>>> >> So I'm not sure Bob's new section 3 here is how to best to state this
>>> >> (or
>>> >> to deal with the terminology problem).  Certainly, it may be the same
>>> >> instance of an AQM algorithm, rather than different instances, for
>>> >> example.
>>> >> And "
>>> >> It SHOULD be possible" is more a pious wish than anything else.  But I
>>> >> agree in spirit with what Bob's trying to say.
>>> >>                                - Jim
>>> >>
>>> >>
>>> >>
>>> >> _________________________________________________________________________________________
>>> >> {In Section 4: add another bullet between recommendations 2 & 3:}
>>> >> 3{New}. It SHOULD be possible to make different instances of an AQM
>>> >> algorithm apply to different subsets of packets that share the same
>>> >> queue.
>>> >> It SHOULD be possible to classify packets into these subsets at least
>>> >> by ECN
>>> >> codepoint [RFC3168] and Diffserv codepoint [RFC2474] (or the equivalent
>>> >> of
>>> >> these fields at lower layers).
>>> >> {Then a new section to expand on this before the current Section 4.3.}
>>> >> 4.3{New}. Independent AQM Instances for ECN and Diffserv
>>> >> The recommendation to provide a separate instance of the AQM for ECN
>>> >> packets goes beyond the assumptions of RFC 3168, which assumed that
>>> >> only one
>>> >> instance of an AQM will handle both ECN-capable and non-ECN-capable
>>> >> packets.
>>> >>
>>> >>
>>> >> Bob
>>> >>
>>> >>
>>> >> ________________________________________________________________ Bob
>>> >> Briscoe,                                                  BT
>>> >> _______________________________________________ aqm mailing list
>>> >> [email protected] https://www.ietf.org/mailman/listinfo/aqm
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> aqm mailing list
>>> >> [email protected]
>>> >> https://www.ietf.org/mailman/listinfo/aqm
>>> >>
>>> >>
>>> >> ________________________________________________________________
>>> >> Bob Briscoe,                                                  BT
>>> >>
>>> >> _______________________________________________
>>> >> aqm mailing list
>>> >> [email protected]
>>> >> https://www.ietf.org/mailman/listinfo/aqm
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> aqm mailing list
>>> >> [email protected]
>>> >> https://www.ietf.org/mailman/listinfo/aqm
>>> >>
>>> >> ________________________________________________________________
>>> >> Bob Briscoe,                                                  BT
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> aqm mailing list
>>> >> [email protected]
>>> >> https://www.ietf.org/mailman/listinfo/aqm
>>> >>
>>> > I can imagine many other possible algorithms that CoDel for the
>>> > mark/drop
>>> > algorithm; we happen to like CoDel atm for two reasons: 1) it is self
>>> > adapting to the line rate, and 2) head mark/drop signals the TCP's as
>>> > soon
>>> > as a decision is made rather than possibly being applied much later
>>> > (such as
>>> > random or tail drop).  We welcome and encourage further essays in the
>>> > art.
>>> >
>>> >
>>> > _______________________________________________
>>> > aqm mailing list
>>> > [email protected]
>>> > https://www.ietf.org/mailman/listinfo/aqm
>>> >
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>>
>>> Fixing bufferbloat with cerowrt:
>>> http://www.teklibre.com/cerowrt/subscribe.html
>>
>>
>> ________________________________________________________________
>> Bob Briscoe,                                                  BT
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt: 
> http://www.teklibre.com/cerowrt/subscribe.html



-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Reply via email to