I agree with most of the suggested features that must be tested
for AQM evaluation.
But I have some doubts if the proposed experiments/metrics
are really applicable and able to reveal the required features.
My comments in detail:
Section 2.1: Flow completion time:
Is applicable only for equally sized finite flows.
Not really meaningful for variable sized flows, e.g. the Tmix trace.
Not applicable to infinite flows.
Section 2.2: Packet loss:
- Long term loss probability is meaningful only in a steady state
scenario. And it characterizes the TCP flavor, not the AQM. (Loss
probability remains the same, whatever you do with AQM, as long as you
reach roughly the same throughput.)
- interval between consecutive losses: If the losses are well spaced,
it resembles somehow the loss probability. If not well spaced (bursty),
what to record?
- packet loss patterns: Metric is undefined, except the special case
"packet loss synchronization", next section, 2.3. It is, indeed, highly
interesting qualitatively in non-stationary cases, e.g. abrupt capacity
drop. But how to quantify?
Section 2.4: Goodput:
- Meaningful only with steady state occupancy by a number of more or
less greedy TCP flows. Here it shows, to which extent the AQM is able
to keep a link close to the 100% utilization.
- With a trace of variable sized flows (Tmix) the goodput resembles the
traffic offer (if in total below the link capacity; not overloaded).
- The overload scenario does not reach steady state. Goodput in
overload cases is highly dependent on other things than AQM, e.g. test
duration or shuffling of the trace.
Section 2.6: Trade-off latency vs. goodput:
The section refers to two (x,y) plots of the form:
X=delay(parms)
Y=goodput(parms)
and
X=delay(parms)
Y=drop_ratio(parms)
where <parms> are tuples of parameter values, each describing
one experimental set-up.
It remains unclear, what <parms> might be in the context of the given
document. The cited document [TCPEVAL2013] suggests that one dimension
of <parms> might be the scaling of the applied Tmix trace. The other
parameters in the cited documents are not applicable here. More on this
see below.
Section 4.1: TCP-friendly sender:
Requires the plots according to section 2.6. But at the same time it
specifies one single long-lived, non application limited flow - this is
one single dot in each of the plots.
Section 4.2: Aggressive Transport Sender: same problem
Section 4.3: Unresponsive Transport: same problem; moreover:
Scenario is only applicable to scheduling, not to AQM. The described
traffic simply overloads the link with no response to AQM (that is, to
my understanding the meaning of unresponsive traffic). A "long-lived
non application limited UDP flow" is somewhat infinite, other than its
counterpart TCP.
I would suggest a different test here: In a mixture of responsive and
unresponsive traffic do a test, to which extent the AQM scheme is still
able to keep the responsive fraction under control. This requires that
the unresponsive traffic is well below the capacity limit. The
rationale behind this test is that an AQM scheme might under- or
over-react if it drops packets but does not see the expected reduction.
Section 4.4: Initial Congestion Window:
Makes sense only with a mix of short lived flows. For long-lived flows
the IW does not matter. Alternatively a single experiment of a pre-
existing long-lived flow and a newly appearing IW3/IW10 flow could be
executed. But there is no reference to any traffic mix. The table
specifies just 2 flows in parallel.
What are the <parms> for graphs according to section 2.6?
Section 4.5: Traffic Mix
The section defines its own traffic mixes in a table, but requires the
graphs according to 2.6, which somehow implies the Tmix traffic.
Section 6: Burst absorption
Same as above; the test comes with its own traffic mix, but requests
graphs according to 2.6, thus implying Tmix.
The proposed bursty scenarios seem to be not specific enough, if
compared with 4.5.
I would propose here, for reproducibility, something like UDP on/off
background traffic.
Section 7: Stability
This section mixes two different things:
(a) The impact of general drop rate by other cross traffic, which is
unrelated to the bottleneck link.
(b) Reaction to varying link capacity at the bottleneck.
The general drop rate experiment (a) is weakly specified: If the drop
rate is too high, the bottleneck capacity cannot be reached; AQM does
not matter. Or, the other way round, if the drop rate is too low, the
AQM algorithm dominates the drop process, whereas the background drops
don't really matter. Only the transition between both regimes could be
of interest. But how to get there, and is this of relevance in practice?
The varying capacity experiment (b) is really relevant. I am asking
myself, if there could be resonant effects in the AQM parameter
adaptation algorithms, and how to test for this?
Wolfram Lautenschläger
Alcatel-Lucent
Bell Labs
_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm