Re: [aqm] last call results on draft-ietf-aqm-recommendation

Bob Briscoe Fri, 16 May 2014 02:18:08 -0700

Dave,

At 21:25 15/05/2014, Dave Taht wrote:

On Thu, May 15, 2014 at 9:17 AM, Bob Briscoe <[email protected]> wrote:
> Gorry,
>
>
> At 16:55 15/05/2014, [email protected] wrote:
>>
>> Great, I look forward to comments on the actual text. I agree the front
>> part needs more structure and more topics called out. i started adding
>> that in -04 and would be pleased to add a few more subsections if we get
>> agreement.
>>
>> I'll wait until I see comments before looking at updating the text with
>> Fred.
>>
>> Gorry
>>
>> > Wes,
>> >
>> > Thx. In case I don't get time to read, then type, I'll shoot my mouth
>> > off anyway...
>> >
>> > Sorry this is a bit rushed and dismissive. That's not my intention -
>> > I'm v supportive of the recommendations that have now been carefully
>> > and nicely worded. I will give more detailed comments, but these are the
>> > MSBs.


I had hoped we were done.

My only major objection at this point was that Iwould like *all* the references

to have hyperlinks to the referenced material. Getting them online would
be a few days work for someone willing to make a few phone calls and
interlibrary loans.


>> > 1) My main concern: The two halves of the document seemed nearly
>> > unrelated (at least in draft-03 and it looks like draft-04 hasn't
>> > changed this). The first half (Sections 1,2,&3) framed the problem as
>> > primarily about preventing congestion collapse and preventing
>> > flow-unfairness, while the recommendations (section 4) were about
>> > AQM. The irony of this sentence is deliberate.
>> >
>> > I had few concerns about the recommendations text (section 4), which
>> > we've all been focusing on, including me. But I hadn't realised the
>> > introductory text was so out of kilter with the recommendations.

Well as it built on the original document, it retains its structure
and concerns,
and is an easy read for those that had read the original.

As apparently, not enough people paid attention to it the first time around,
perhaps more rewriting is required. I would certainly like to expose it to
an audience of CTOs as a use test, and ask for their feedback.

>> > Sections 1,2 and 3 seemed to focus on problems that I wouldn't even
>> > address with AQM (from a quick scan it looks like these sections
>> > haven't changed in this respect for draft-04):
>> >
>> > a) Congestion collapse: An AQM cannot prevent congestion collapse -
>> > that is the job of congestion control and, failing that, of policing.

I note that I explicitly deprecated conventional policing (a fixed size
buffer on ingress) in the home-gateway draft I've been meaning to finish
and keep thinking needs to be expanded into a
"need-for-comprehensive-queue-management"
draft. All these (aqm/fq/htb/policing/qos) concepts impact each other.

running an aqm/packet scheduler/rate limiter on inbound makes more sense than
buffer limits, and indeed, works better while achieving the desired result.

http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die#Benchmarking-Wondershaper-vs-CeroWrts-SQM

As another example, how does the competing qos vs ecn enabled schemes
for webrtc fit into our big picture?

http://tools.ietf.org/html/draft-dhesikan-tsvwg-rtcweb-qos-01

vs

https://datatracker.ietf.org/doc/draft-zhu-rmcat-nada/

This is getting way-off the original point, whichwas that collapse is not addressable by AQM, onlyby congestion control and, failing that, policing.

Congestion collapse is a specific well-definedcondition. Let's not get side-tracked intoreferences about it, how to solve it, etc.Because it's just not solvable using AQM orflow-separation. It's irrelevant to this WG'scharter (not mentioned in the charter, and wewere right not to have mentioned it). Let's takeall the stuff about collapse out. Full stop.

>> > Even isolation (e.g. flow separation) doesn't prevent congestion
>> > collapse, because collapse is caused by the load from new flow
>> > arrivals exceeding the ability of the system to serve and clear load
>> > from existing flows, most likely because many existing flows are not
>> > sufficiently responsive to congestion, so retransmissions dominate
>> > over goodput (even if each unresponsive flow is in an isolated silo).
>> > Flow separation doesn't help when the problem is too many flows.
>> >
>> That would seem OK to call-out, at least to me.

Not sure where this goes in the text.

>
>
> My concern is that it's wrong to introduce a doc with a description of a
> problem that we're not addressing in the body of the doc (even tho collapse
> is an important problem, AQM doesn't address it, so why is it even relevant
> at all?). E.g. we could also add world hunger to the introduction, but it
> wouldn't be relevant.

Well, a collapse concern that AQM solves is where buffering is so
extreme as to start defeating
that native retry/retransmit logic built into many protocols. Fred has
shown pictures of TCP RTOs coming from excessive buffering fooling
tcp, the early bufferbloat work was all showing tcp getting fooled on
the overlong path, and there are plenty of other examples where
excessive buffering leads to excessive retransmits, like, well, nearly
everything over a second.

This is certainly a pathology that AQM solves butsensible buffere sizing solves this too. Packetsaren't actually getting lost so they do leave thebuffer eventually and the flows will depart fromthe system. TCP is retransmitting because itthinks packets have probably got lost. So moretraffic is sent than needs to be sent. I'm notconvinced this will lead to the same conditionsas a congestion collapse, but it might... Thiscondition inflates the load on the network by aproportion that depends on the flow lengths.

It's certainly not helpful to use the termcongestion collapse in this draft if this is whatwe want to mean, 'cos the cause of collapse hasalways been described as poor end-systemresponse, not stupidly bloated buffers that drivethe end-system response outside its normal range.

So again, I repeat that assigning a large part ofthe introduction to talk about congestion collapse is not helpful.

>
>
>
>> > b) Flow fairness (or user-fairness etc): this is a policy issue that
>> > needs to be built in a modular way, for optional addition to AQM.

I don't really understand what you are saying here. My take on it is that
all the different components of a queueing system need to be thought
about and integrated as a whole.

http://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queue_Management

This is an important difference in goals then. Ifundamentally disagree. I start from the end-to-end principle:* queue management in /every/ buffer does theminimum necessary to keep each queue short under /normal/ conditions.* then per-user policing in /one/ edge bufferprotects users from each other where they sharethe access into the network. The operator chooses the specifics (= policy)* then intra-user, their applications andtransports aim to schedule traffic to best useavailable bottleneck access capacity (application choices = app policy).* then, the user /chooses/ a home gateway thatmight contain scheduling to protect theirapplications from each other, but might not (= policy).

Attempting to provide a completely integratedsolution implies the designer is closing off anychoice (policy) at these different decision stages.

>> > Therefore an AQM must also work well without fairness mechanisms.
>> > This conclusion was actually reached in the early sections, but it's
>> > not carried forward into the recommendations in section 4.
>> >
>> > If the conclusion is that AQM isn't intended to solve these two
>> > problems, we need to clearly say so. Most people who need to read
>> > this will be confused, so we shouldn't confuse them further!
>> >
>> OK - as long as we get agreement from the various AQM proposals, some
>> methods rely heavily on flow isolation to achieve their wanted behaviour.

Pie and codel are very comparable. In fact, people
are pushing ARED as well, with comparable (if, IMHO, overly
simplistic) benchmarks.

Now, if you are talking about some other proposal (SQF? LRED? SFQRED?) not yet
discussed here?

I'm talking about PIE & CoDel here. I think you(and others) do not recommend CoDel unless it isintegrated with FQ (=flow queueing). Whereas PIEis intended to work without any scheduling, butFQ (='fair' queueing) can be added if the vendor/chooses/ then the user can /choose/ whichvendors don't make the wrong judgements abouttheir applications. This last choice still isn'tideal, because users can't choose vendors thatdon't screw with /future/ apps that will never bedeployed because of choices made earlier and embedded in network devices.

I'll come back to the rest later if time permits.I've got to catch a train - got to take a load ofWW2 memorabilia to a film company - but that's another story!

I've tried to respond to your follow-up email onpolicy, with my earlier comments in this email.

Bob

Anyway, on the two candidates:

ecn is off by default in both codel and pie because the results of all
the methods
tried to date for doing flood protection were less than satisfactory.

Pie got some ecn overflood protection that would be easy to add to
codel, and some
would like some stronger flooding protections in codel.

ecn is on by default in fq_codel because its method of dealing with
floods is more effective
than anything we've come up with for standalone single queue aqms -
and that too could be improved, but it's not been high on my radar,
and any improvement has a cpu cost that is hard to deal with on 10gigE
hardware.

So I'd say that "some methods rely heavily on flow isolation
integrated with aqm to achieve better behaviors than aqm alone" rather
than "some methods rely heavily on flow isolation to achieve their
wanted behaviour." in this case.

I do look forward to more/better algorithms being presented here one day.


>
> Indeed, my point about fairness is different from my point about collapse
> (which just isn't even relevant). Fairness isn't strictly relevant to AQM,

> but flow isolation is used to complementAQMs. And flow isolation (which the

> doc calls 'scheduling') can't be done without affecting fairness.
>
> So 2.1 concludes:
> "  In short, scheduling algorithms and queue management should be seen
>    as complementary, not as replacements for each other.
> "
> This is a conclusion that should be reflected in the recommendations and
> conclusions. I.e. if they are complements, they need to be separable, not
> integrated.

I agree with the complement language. I don't mind if they are separable.
Integration, however, is highly advantagous.

>Because scheduling requires policy and AQM doesn't.

Machine gunning down packets randomly until the flows start to behave
does not require any policy, agreed. a 5 tuple fq system is not a lot
of policy to impose. certainly
qos and rate scheduling systems impose a lot more policy.

> So operators
> don't want to have to face the dilemma of needing the AQM part, but not
> being able to have it because they don't want the policy implicit in the
> scheduling part.
>
> This is critical for fq_codel, because apparently CoDel alone is not
> recommended (which I would agree with).

Codel is being used stand alone in BSD with pretty good results with HFSC.

btw, if anyone here can describe how hfsc works in words of less than
4 syllables,
I'd love to hear it. It's apparently used a lot in the DSL world.

> This means that we really need fq_X,
> where X is something that can be recommended either alone or with fq.

Why? An algorithm's an algorithm. All algorithms have effects. To me
we need well described algorithms that have well described effects, and
that's it.

>
>
>
>
>> >
>> > 2) There's no statement of scope.
>> > Can we really make all these recommendations irrespective of whether
>> > we're talking about high stat-mux core links, low stat-mux access
>> > links, low-stat-mux data centre links, or host buffers? Are there
>> > different recommendations for edge links (on trust boundaries) vs
>> > interior links? Does AQM apply at L2 as well as L3 (of course it
>> > does)? Which recommendations are different for each layer? Does AQM
>> > apply for middleboxes (firewalls, NATs etc) as much as for switches
>> > and routers? If not why not (only need AQM if there can be queuing -
>> > perhaps due to processor overload)?

I agree that a more comprehensive, second set of documents would be good.
My hope was that we would build upon the evaluation guidelines
documents and tests
to eventually get to where these scenarios could be benchmarked and described,
relative to the other tests developed in the other working groups.

>> >
>> > To illustrate the problem, our goal should be AQM in every buffer.
>> > But we really don't need and shouldn't have policing or isolation in
>> > every buffer.

I could argue the opposite - that we could have scheduling in every
buffer that handles mixed flows, and drop tail or hw flow suffices to handle
the rest. Take a look at the tipc work, for example.

>> >
>> Hmmm... that will be interesting - suggest a para or two please?
>
>
> I'll try - but I'm having to write an ETSI doc at the mo (which is why I'm
> preferring to get distracted by AQM :)

Well, I'm trying to get a final release of cero done.

>
>
>> > 4) Because sections 1,2,3 focused heavily on the above two problems
>> > (collapse and fairness) that can't really be addressed by AQM, these
>> > sections also gave insufficient attention to problems that AQM does
>> > address (and should address), E.g.:
>> >
>> > * synchronisation and lock-out were both described as vaguely the
>> > same problem,
>> > * synchronisation wasn't explained,
>>
>> yes, agree completely should be added as a subsection in section 2 - This
>> was overlooked.

yes, some reference to tcp global sync.

>>
>> > * lock-out wasn't explained but it was said to be vaguely to do with
>> > synchronisation and solving it would help low capacity bursty flows,
>> Comment on bullet 3 in Sect 2.
>>
>> > * large-packet lock-out problems weren't mentioned, only flow-level
>> > lock-out.

While we are here, talking about the problems induced by bursty loss
might be good.

>> >
>> OK - suggest a line or two of text please:-)
>
>
> Will do.
>
>
>
>> > Perhaps as a result, there's no recommendation on avoiding
>> > synchronisation (e.g. using randomness).
>> >
>> There should be.
>>
>> > 5) I said these are the MSB's only, but allow me one nit about the
>> > Intro:
>> >
>> > "  there is currently no consensus solution to controlling the
>> >     congestion caused by such aggressive flows; significant research and
>> >     engineering will be required before any solution will be available.
>> >     It is imperative that this work be energetically pursued, to ensure
>> >     the future stability of the Internet.
>> > "
>> > The draft could at least mention congestion policing and ConEx RFCs
>> > coming out of the IETF right now (e.g. RFC6789 and
>> > draft-ietf-conex-abstract-mech, which is with the IESG).
>> >
>> >
>> > I promise I'll do a proper detailed review of the new text ASAP.
>
>
> Cheers
>
>
> Bob
>
>
>> >
>> >
>> > Bob
>> >
>> > At 13:13 15/05/2014, Wesley Eddy wrote:
>> >>On 5/15/2014 5:09 AM, Bob Briscoe wrote:
>> >> > Wes,
>> >> >
>> >> > I assume you also want comments on the new version. Is there a
>> >> deadline
>> >> > for comments?
>> >>
>> >>
>> >>Absolutely, yes.  There's no "deadline" at the moment, but it would
>> >>be good to get any out sooner rather than later, especially if they're
>> >>likely to need more discussion or are asking for major changes.
>> >>
>> >>
>> >> > I prepared comments on the previous version, but didn't get the time
>> >> to
>> >> > type them up. So I want to try to remedy this with the new version
>> >> (that
>> >> > I haven't read yet).
>> >>
>> >>
>> >>The diffs aren't huge, so many of your comments on the previous
>> >>revision might still be valid.
>> >>
>> >>
>> >>--
>> >>Wes Eddy
>> >>MTI Systems
>> >>
>> >>_______________________________________________
>> >>aqm mailing list
>> >>[email protected]
>> >>https://www.ietf.org/mailman/listinfo/aqm
>> >
>> > ________________________________________________________________
>> > Bob Briscoe,                                                  BT
>> >
>> > _______________________________________________
>> > aqm mailing list
>> > [email protected]
>> > https://www.ietf.org/mailman/listinfo/aqm
>> >
>
>
> ________________________________________________________________
> Bob Briscoe,                                                  BT
> _______________________________________________
> aqm mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/aqm



--
Dave TÃ¤ht

NSFW:https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article


_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm


________________________________________________________________

Bob Briscoe, BT

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

Re: [aqm] last call results on draft-ietf-aqm-recommendation

Reply via email to