[aqm] Comments and proposed changes to AQM rec -04

gorry Fri, 13 Jun 2014 11:44:23 -0700

Following the WGLC there was some further discussion of the AQM WG
recommendations draft. Some of this has suggested further updates to this
draft. This is a list of the key points we noted, and our proposed action
to update draft -04.


1) DOCUMENT FLOW
COMMENT BB: The two halves of the document seemed nearly
unrelated (at least in draft-03 and it looks like draft-04 hasn't
changed this). The first half (Sections 1,2,&3) framed the problem as
primarily about preventing congestion collapse and preventing
flow-unfairness, while the recommendations (section 4) were about AQM.
ACTION:
In the new revision we will rearrange/revise section 2 to better align
with the document, but we have not done a major revision.
---
2) CONGESTION COLLAPSE
BB: Let's take all the stuff about collapse out. Full stop.
ACTION:
At the moment Congestion Collapse continues to be mentioned, but the text
is clear that AQM is just one tool and the primary mechanism is now
clearly called out as end-to-end methods: CC and CB.
---
3) FLOW FAIRNESS
COMMENT BB: Flow fairness (or user-fairness etc): this is a policy issue
that needs to be built in a modular way, for optional addition to AQM.
ACTION:
This was intended to be clearer in -04, and the text has also been updated.
---
2a) fairness mechanisms
COMMENT BB:  An AQM must also work well without fairness mechanisms. This
conclusion was actually reached in the early sections, but it's not
carried forward into the recommendations in section 4.
ACTION:
Added this to 4.1:
        AQM mechanisms need to allow combination with other mechanisms,
        such as scheduling, to allow implementation of polices for
        providing fairness between different flows.
---
3) Statement of scope.
BB: Can we really make all these recommendations irrespective of whether
we're talking about high stat-mux core links, low stat-mux access links,
low-stat-mux data centre links, or host buffers? Are there different
recommendations for edge links (on trust boundaries) vs interior links?
Does AQM apply at L2 as well as L3 (of course it does)? Which
recommendations are different for each layer? Does AQM apply for
middleboxes (firewalls, NATs etc) as much as for switches and routers? If
not why not (only need AQM if there can be queuing - perhaps due to
processor overload)?
To illustrate the problem, our goal should be AQM in every buffer.
But we really don't need and shouldn't have policing or isolation in every
buffer.
NOTE: Gorry - this seems like an issue we have been through, putting AQM
in the core would need care, given that most AQM methods we know are
optimised for RTT (for instance) which will effectively degrade people
"further" away in the Internet?
ACTION:
The goals of the draft were clarified at the start of section 2:
   Active Queue Management (AQM) is a method that allows network devices
   to control the queue length or the mean time that a packet spends in
   a queue.  Although AQM can be applied across a range of deployment
   enviroments, the recommendations in this document are directed to use
   in the general Internet.  It is expected that the principles and
   guidance are also applicable to a wide range of environments, but may
   require tuning for specific types of link/network (e.g. to
   accommodate the traffic patterns found in data centres, the
   challenges of wireless infrastructure, or the higher delay
   encountered on satellite Internet links).  The remainder of this
   section identifies the need for AQM and the advantages of deploying
   the method.
---
4) synchronisation and lock-out
COMMENT BB:
* synchronisation and lock-out were both described as vaguely the same
problem, * synchronisation wasn't explained,
Gorry: Agree completely should be added as a subsection in section 2 -
This was overlooked.
ACTION:
Section 2 now calls out "control loop synchronisation" both as a problem
and area to be addressed.
        Congestion control, like other end-to-end mechanisms, introduces
        a control loop between hosts. Sessions that share a common network
        bottleneck can therefore become synchronised, introducing periodic
        disruption (e.g. jitter/loss). "lock-out" is often also the result
        of synchronization or other timing effects.
Section 4 Recommends:
        Procedures for dropping or marking packets within the network
        need to avoid increasing synchronisation events, and hence
        randomness SHOULD be introduced in the algorithms that
        generate these congestion signals to the endpoints.
---
5) Large Packet Lock-Out
COMMENT BB: * lock-out wasn't explained but it was said to be vaguely to
do with synchronisation and solving it would help low capacity bursty
flows. Comment on bullet 3 in Sect 2.
* large-packet lock-out problems weren't mentioned, only flow-level lock-out.
ACTION:
Description expanded to:
        In some situations tail drop allows a single connection or a
        few flows to monopolize the queue space starving other connection
        preventing them from getting room in the queue. The simplest mechanism
        starts with a new or building session attacking a queue that is full.
        One or more sessions, following algorithms similar to those of 
[RFC5681],
        maximizes its effective window, maximizing its impact on a queue 
somewhere
        in the network and the effect of that queue on both its own latency and
        that of competing sessions. It also maximizes the probability of loss 
from
        that queue. A new session, sending its initial burst, has an enhanced
        probability of filling the remaining queue and dropping packets. As a
        result, the new session can be effectively prevented from sharing the
        queue effectively for a period of many RTTs. One objective of AQM is to
        minimize the effect of lock-out by minimizing mean queue depth and
        therefore the probability that competing sessions can materially prevent
        each other from performing well.
---
6) Conex
COMMENT BB: "there is currently no consensus solution to controlling the
congestion caused by such aggressive flows; significant research and
engineering will be required before any solution will be available. It is
imperative that this work be energetically pursued, to ensure the future
stability of the Internet. "
The draft could at least mention congestion policing and ConEx RFCs coming
out of the IETF right now (e.g. RFC6789 and
draft-ietf-conex-abstract-mech, which is with the IESG).
ACTION:
Added:
        Methods such as congestion exposure (ConEx) [RFC6789]
        offer a framework [CONEX] that can update network devices
        to alleviate these effcects. Significant research and
        engineering will be required before any solution will
        be available. It is imperative that work to mitigate
        the impact of unresponsive flows is energetically pursued,
        to ensure the future stability of the Internet.
---
7) Refs
Comment WE: The Choi04 reference got messed-up in this (authors names are
omitted).
ACTION:
This was an XML problem, fixed in the revision.
---
8) hyperlinks
Comment DT: My only major objection at this point was that I would like
*all* the references to have hyperlinks to the referenced material.
ACTION:
Additional reference material can be provided. The RFC-Ed does not
normally publish URLs for citations in RFCs.
---
9) bursts
Comment DT: talking about the problems induced by bursty loss might be good.
ACTION:
No specific text has currently been added.
---

We plan to submit these changes (and a few editorial corrections) in
revision -05, if you have comments on the text or suggestions do let us
know. Also let us know if anything important was overlooked and we'll see
if we can address this also in rev -05.

best wishes,

Fred and Gorry


_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

[aqm] Comments and proposed changes to AQM rec -04

Reply via email to