Hi, Am 10.08.2015 um 15:43 schrieb Wesley Eddy: > As chairs, Richard and I would like to start a 2-week working > group last call on the AQM characterization guidelines: > > https://datatracker.ietf.org/doc/draft-ietf-aqm-eval-guidelines/ > > Please make a review of this, and send comments to the list or > chairs. Any comments that you might have will be useful to us, > even if it's just to say that you've read it and have no other > comments.
"Unfortunately", we (Polina and I) did a thorough review, which is attached. TL;DR: from our point-of-view the I-D needs a major revision. Regards, Roland
I completed my review for draft-ietf-aqm-eval-guidelines-07 and discussed it also with Polina, who did her own review which we eventually aggregated here. We both think that this document needs a major revision due to the amount of issues we identified. Major issues: ------------- 1) Structure, overview, rationale and requirements The structure should/could be improved. The goal and methodology should be put first. Some motivation given in Section 14 should be moved to the beginning, e.g., the goal of this document is stated in Section 14.3. 2) It is unclear whether the tests from Sections 4-9 should be carried out without or with ECN. Section 12 discusses this much too late. 3) the overall number of tests and parameter combinations is really high 4) from the discussed end-to-end metrics only latency/goodput metrics are used in the scenarios and for some of the scenarios these metrics are not suitable to show the desired behavior 5) some sections in this document (e.g., 7.3, 10, 13) specify requirements for an AQM standard(/draft) and not requirements for a performance evaluation, so these sections should be moved to [draft-ietf-aqm-recommendation] 6) Related Work: There are several works that deal with evaluation of TCP or congestion control performance: - RFC 5166 https://tools.ietf.org/html/rfc5166 (Metrics for the Evaluation of Congestion Control Mechanisms) is IMHO higly relevant but neither referenced nor discussed - Yee-Ting Li, Douglas Leith, and Robert N. Shorten. 2007. Experimental evaluation of TCP protocols for high-speed networks. IEEE/ACM Trans. Netw. 15, 5 (October 2007), 1109-1122. DOI=10.1109/TNET.2007.896240 http://dx.doi.org/10.1109/TNET.2007.896240 - Andrew et al.: Towards a Common TCP Evaluation Suite, Proceedings of the International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet), Manchester, United Kingdom, March 2008 Detailed comments per section: ============================== (the %%%%%% just separates different issues within the section comments) {Section 1} ----------- AQM schemes aim at reducing mean buffer occupancy, and therefore both end-to-end delay and jitter. ==> is this true for every AQM ? %%%%%% In real implementations of switches, a global memory is shared between the available devices: This may be a common architecture nowadays, but not necessarily always be the case... => In real implementations of switches, a global memory is _often_ shared between the available devices: %%%%%% the size of the buffer for a given communication does not make sense ... and then... The rest of this memo therefore refers to the maximum queue depth as the size of the buffer for a given communication. => I don't understand what you mean here. First you say it doesn't make sense, then you define maximum queue depth as exactly the size of the buffer for a given communication. - Do you mean buffer occupancy? - Is "communication" here an end-to-end data flow or an aggregated flow? - the term "maximum queue depth" is never used in the document again... but "maximum queue size", "maximum buffer size" I think it is essential to understand the difference between the buffer size and the buffer occupancy that the AQM tries to control. Due to shared memory architectures the buffer size may not be fixed and thus vary for a given interface. Is the buffer (size) here meant in both directions for bidirectional traffic? %%%%%% Bufferbloat [BB2011] is the consequence of deploying large unmanaged buffers on the Internet, which has lead to an increase in end-to-end delay: the buffering has often been measured to be ten times or hundred times larger than needed. Large buffers per se are not a real problem unless combined with TCP bandwidth probing or unresponsive flows that fill buffers. %%%%%% The Active Queue Management and Packet Scheduling Working Group (AQM WG) was recently formed within the TSV area to address the problems with large unmanaged buffers in the Internet. Specifically, the AQM IMHO this and the following paragraphs should be rephrased so that the statement is also true in some years after the WG has concluded... %%%%%% Missing: The use of ECN is also an incentive to use/deploy AQMs {Section 1.1} ------------- The trade-off between reducing the latency and maximizing the goodput => Goodput isn't defined at its first use, probably do a forward reference to section 2.5 and/or put it in the Glossary (sec. 1.4) This document provides guidelines that enable the reader to quantify (1) reduction of latency, (2) maximization of goodput and (3) the trade-off between the two. => This should be moved into Section 1.2, but seems to be redundant with its first sentence anyway: The guidelines help to quantify performance of AQM schemes in terms of latency reduction, goodput maximization and the trade-off between these two. %%%%%% These guidelines provide the tools to understand the deployment costs ... => I doubt that anything is said about deployment _costs_ in the draft. 14.3.2 discusses some aspects w.r.t. handling the AQM in practice, but not really deployment costs... {Section 1.2} ------------- The guidelines also help to discuss safe deployment of AQM, including self-adaptation, stability analysis, fairness, design and implementation complexity and robustness to different operating conditions. => These terms should be explained before they are actually used. %%%%%% This memo details generic characterization scenarios against which any AQM proposal needs to be evaluated => *needs* sounds a bit strange %%%%%% This document details how an AQM designer can rate the feasibility of their proposal in different types of network devices (switches, routers, firewalls, hosts, drivers, etc) where an AQM may be implemented => There is nothing specific about firewalls, hosts, and drivers in the rest of the document. The proposed test topology considers routers only. {Section 1.3} ------------- AQM: Should be expanded at least once here {Section 1.4} ------------- Strictly speaking, queue should be defined here, too {Section 2.1} ------------- FCT [s] = Fs [B] / ( G [Mbps] / 8 ) please use unambiguous units instead of B and bps: FCT [s] = Fs [Byte] / ( G [Bit/s] / 8 [Bit/Byte] ) => Goodput of a flow is defined in 2.5 but referenced here. => Can one really speak of Goodput for a flow that is 10-100 packets long? it probably makes more sence to measure FCT directly %%%%% If this metric is used to evaluate the performance of web transfers, *we propose* => Avoid "we", e.g., replace with "it is suggested" => (Considering section 6.2 too) It might be a good idea to standardize how to generate web traffic and what metric to measure. Consider, for example, how web traffic is generated in "Experimental evaluation of TCP protocols for high-speed networks" {2.2. Flow start up time} This metric is not used in later tests... {2.3 Packet loss} Packet loss can occur within a network device, this can impact the end-to-end performance measured at receiver. => It can also occur at the sender and receiver... %%%%%% This metric is not used in later tests...(except indirectly in goodput) Measuring packet loss is probably essential since retransmissions can also be triggered by reordering. Furthermore, packet loss caused by the AQM through packet drops should be measured separately (in order to find out whether other drops happened elsewhere). %%%%%% The tester SHOULD evaluate loss experienced at the receiver using one This may be misleading if the cause of the loss isn't clear...see above. {Section 2.5} ------------- number of bits per unit of time forwarded to the correct destination interface of the Device Under Test or the System Under Test, minus => are Device Under Test and System Under Test universally known terms? they are defined in RFC 2544 Additionally, an AQM scheme may resort to Explicit Congestion Notification (ECN) ... cf. major issue 2. {Section 2.6} ------------- One-way delay as discussed in RFC 2679 is a little bit more precise since it also specifies at which layer the delay is measured (Type-P-One-way-Delay). I guess that we want to consider IP packet delay?! %%%%%% Typo: - There is a consensus on a adequate metric for the jitter, that ---- + There is a consensus on an adequate metric for the jitter, that %%%%%% The end-to-end latency differs from the queuing delay: it is linked to the network topology and the path characteristics. => this reads a bit strange to me: queuing delay is part of the end-to-end latency (together with signal propagation delay, transmission delay, processing delay). => what is exactly meant by path characteristics here? is that the fixed delay portion, i.e., signal propagation delay, transmission delay? %%%%%% Moreover, the jitter also strongly depends on the traffic pattern and the topology. => I'm not sure how jitter depends on the topology. Jitter is usually caused by variations in queuing and processing delay (e.g., scheduling effects and so on). %%%%%% The introduction of an AQM scheme would impact these metrics and => these metrics are: one-way delay and one-way delay variations? {Section 2.7} ------------- With regards to the goodput, and in addition to the long-term stationary goodput value, it is RECOMMENDED to take measurements every multiple of RTTs. We suggest a minimum value of 10 x RTT (to => "every multiple of RTTs" is probably a bad recommendation since RTT is variable due to queuing delay. minRTT would be probably ok. %%%%%% smooth out the fluctuations) but higher values are encouraged => what does "higher" mean here? more frequently? (if so, please rephrase) %%%%%% From each of these sets of measurements, the CDF of the considered => Please expand CDF at least once. %%%%%% This graph provides part of a better understanding of (1) the delay/ goodput trade-off for a given *congestion control mechanism*, + AQM scheme and (2) how the goodput and *average queue size* vary as a function of the traffic load. => in order to see how something varies as a function of the traffic load one should perform measurements for different traffic loads, which is not done in every scenario. => average queue size should probably be replaced with delay. %%%%%% the goodput and ellipses are computed such as detailed in [WINS2014]. => since nearly every of the following tests recommends plots according to this graph, please write it up here. Maybe Keith's thesis is accessible for some years, but it would be good to document such a central element within the Draft/RFC itself. {Section 3.1} ------------- in the figure: + +-+---+---+ +--+--+---+ + | |Router L | |Router R | | | |---------| |---------| | | | AQM | | | | | | BuffSize| | | | | | (Bsize) +-----+ | | | +-----+--++ ++-+------+ | + | | | | + its unclear to me what these lines here between the traffic class boxes mean: + | | | | | | + may be replace by an ellipsis . . . => moreover, what about the buffers in Router R? The are assumed to be empty I guess ... %%%%%% o various classes of traffic can be introduced; => I would avoid traffic class since this can be confused with diffserv classes easily. Later in the document their are called "traffic profiles" which I find a more suitable term (then use it consistently throughout the document). %%%%%% o various classes of traffic can be introduced; => better rephrase to: o sender with different traffic characteristics (i.e., traffic profiles) can be introduced; %%%%%% o each link is characterized by a couple (RTT,capacity); => better one-way delay instead of RTT? Probably the links are symmetric or asymmetric... %%%%%% o flows are generated between A and B, sharing a bottleneck (Routers L and R); => "generated between A and B" is weird and the bottleneck is the _link_ between L and R, so: o flows are generated at A and sent to B, sharing a bottleneck (the link between routers L and R); %%%%%% AQM mechanism whereas the asymmetric link scenario evaluates an AQM mechanism in a more realistic setup; => sounds like only DSL scenarios are a realistic setup... please consider the usefulness of AQM also in other networks, e.g. even in data centers... %%%%%% - an AQM scheme when comparing this scheme with a new proposed AQM ---- + an AQM scheme when comparing this scheme with a newly proposed AQM {Section 3.2} ------------- The size of the buffers should be carefully chosen, and is to be set to the bandwidth-delay product => bandwidth-delay product between which points exactly? A and B or L and R? => buffer and buffer size are defined as a whole buffer size available for a device. Is it enough for bidirectional traffic? %%%%%% capacity and the delay the larger RTT in the considered network. The => the largest RTT? %%%%%% - size of the buffer can impact on the AQM performance and is a ---- + size of the buffer can impact the AQM performance and is a {Section 3.3} ------------- This memo features three kind of congestion controls: => sounds a bit strange. Maybe something like: This documents considers running three different congestion control algorithms between this category is TCP Cubic. => a reference would be good here... {Section 4.} ------------- This sections reads more like more congestion control evaluation... %%%%%% Network and end-devices need to be configured with a reasonable amount of buffer space to absorb transient bursts. In some situations, network providers tend to configure devices with large buffers to avoid packet drops triggered by a full buffer and to maximize the link utilization for standard loss-based TCP traffic. => This whole paragraph belongs more to section 3.2 Moreover, one needs to evaluate several operation points (parameter settings) to see the AQM behavior in a Goodput/Delay graph. One must change variables that really change the behavior (there are enough papers that vary the buffer size, which would be pretty useless for AQMs like PIE or Codel). => be configured with a reasonable amount of buffer space to absorb transient bursts. => What is "reasonable" now? Usually BDP is recommened, but this may be highly variable, too... %%%% TCP is a widely deployed transport. It fills up *unmanaged* buffers until a sender transfering a bulk flow with TCP receives a signal (packet drop) that reduces the sending rate. => suggestion: replace "umanaged" buffers by "available" buffers because TCP will fill managed buffers too, until the sender receives a congestion signal. {Section 4.1.1} --------------- It would be good to describe the objectives of the test, i.e., the rationale and expected AQM/TCP behavior. %%%%%% friendly transport sender. A single long-lived, non application- limited, TCP NewReno flow, with an Initial congestion Window (IW) set => explicitly defining what "non application-limited" means exactly wouldn't hurt. For instance, an application could be bandwidth or rate limited or also (sending) window limited. %%%%%% For each TCP-friendly transport considered, the graph described in Section 2.7 could be generated. I guess the latency vs. goodput graph is meant here... {Section 4.1.2} --------------- For this scenario, two types of flows MUST be generated between => Yes, so two types doesn't mean necessarily only two flows (cf. SEN.Flow1.1 ... SEN.Flow1.X). o A single long-lived application-limited TCP NewReno flow, with an IW set to 3 or 10 packets. The size of the data transferred must be strictly higher than 10 packets and should be lower than 100 packets. => what does "long-lived" mean? => I doubt that 100 packets is really long-lived! => Are these 1500 bytes packets? For each of these scenarios, the graph described in Section 2.7 could be generated for each class of traffic (application-limited and non application-limited). => what exactly is the goal of this metric? Does delay/throughput graph make sense for 10-100 packet flow? According to the section title, the goal is likely to assert how fast the two flows converge to a fair share depending on the IW of the second flow. In this case, both the scenario and the metric are not very significant. According to the scenario itself the goal could be to assert flow completion time of short flows under presence of background flows. In this case these metrics should be reflected on a result graph. Probably it's useful to add a metric without background flows as a reference point {Section 4.3} ------------- - to keep responsive fraction under control. This scenario considers a ----- + to keep the responsive fraction under control. This scenario considers a %%%%%% sender A and receiver B. As opposed to the first scenario, the rate of the UDP traffic should not be greater than the bottleneck capacity, and should not be higher than half of the bottleneck capacity. For each type of traffic, the graph described in => Not clear why the UDP flow shouldn't be larger than half of the bottleneck capacity. If it had 75% of the bottleneck capacity, one could see whether the AQM is able to squeeze it down to 50% while allowing the TCP flow to get the other half. => Again, what is the goal of this scenario? It looks like that the scenario aims at showing what share of the bandwidth does the TCP flow receive. In this case the results are better illustrated by a fairness index, or two throughput bars and not by the delay/throughput tradeoff. {Section 4.4} ------------- - Single long-lived non application-limited TCP NewReno flows transfer ------ + A single long-lived non application-limited TCP NewReno flow transfers %%%%%% sender A and receiver B. We recommend to set the target delay and gain values of LEDBAT respectively to 5 ms and 10 [TRAN2014]. Other => 10ms? That would however be RTT dependent, i.e., if the topology has much lower RTTs then these values must be adapted accordingly... => again the choice of metrics is questionable. {Section 5} ----------- see also Section 2.3 of RFC 5166 {Section 5.1} ------------- The ability of AQM schemes to control the queuing delay highly depends on the way end-to-end protocols react to congestion signals. => I don't think that this is true in every case. Some AQMs also control queuing delay even for completely unresponsive flows. Therefore, "highly depends" is a bit overstated... %%%%%% for a set of RTTs (e.g., from 5 ms to 200 ms). => RTTs between A and B or between R and L? %%%%%% Introducing an AQM scheme may cause the unfairness between the flows, even if the RTTs are identical. This potential unfairness SHOULD be investigated as well. => if it should, it could be defined as an Intra-Protocol Fairness in Section 4 (IMHO between 4.2 and 4.3). {Section 5.2} ------------- o To evaluate the impact of the RTT value on the AQM performance and the intra-protocol fairness (the fairness for the flows using the same paths/congestion control), for each run, two flows (Flow1 and Flow2) should be introduced. For each experiment, the set of RTT SHOULD be the same for the two flows and in [5ms;560ms]. => this is evaluating not RTT fairness since both flows use the same RTT, but this probably evaluates sensitivity to different RTTs => (forward referencing 5.3) the metric of choice for this scenario (cumulative average goodput of two flows) definitely doesn't show whether the flows are fair to each other. {Section 5.3} ------------- see also RFC 5166, sec. 2.3.3., Fairness and round-trip times. {Section 6.1} ------------- An AQM scheme can result in bursts of packet arrivals due to various reasons. Dropping one or more packets from a burst can result in => I don't get this. TCP or applications usually send/generate bursts, but AQM schemes? %%%%%% An AQM scheme that maintains short queues allows some remaining space in the queue for bursts of arriving packets. => should be (?): some remaining space in the buffer for bursts of ... %%%%%% - directly linked to the AQM algorithm. Moreover, one AQM scheme may ---- + directly linked to the AQM algorithm. Moreover, an AQM scheme may {Section 6.2} ------------- o Bursty video frames; How? What? Congestion Controlled? App limited/rate limited streaming? %%%%%% - o Constant bit rate UDP traffic. ---- + o Constant bit rate (CBR) UDP traffic. => at which rate BTW? %%%%%% o A single bulk TCP flow as background traffic. => non-application-limited? %%%%%% Figure 2: - | |Video|Webs (IW 10)| CBR| Bulk TCP Traffic | ---- + | |Video|Web (IW 10)| CBR| Bulk TCP Traffic | %%%%%% Probably it would make sense to join it with section 8 (which also needs a more precise workload description). %%%%%% For each of these scenarios, the graph described in Section 2.7 could be generated. Metrics such as end-to-end latency, jitter, flow => For each of these scenarios, ... the graph for every flow could be generated? => is not obvious why these scenarios evaluate burst absorbtion, so an explanation of what should/could be expected would be appreciated. {Section 7.2} ------------- application-limited TCP flows. For each of the below scenarios, the results described in Section 2.7 SHOULD be generated. For => replace "results" with "graphs"? => for the throughput of many flows, is it cumulative or average throughput? => One problem with the suggested output is that these metrics are not showing time-dependencies, which are important to see for analyzing transient behavior. {Section 7.2.5} --------------- Why not also consider I,II,III,II,I,...? {Section 7.2.6} --------------- - reflect the exact conditions of Wi-Fi environments since its hard to ---- + reflect the exact conditions of Wi-Fi environments since it is hard to %%%%%% o Experiment 1: the capacity varies between two values within a large time-scale. As an example, the following phases may be considered: phase I - 100Mbps during 0-20s; phase II - 10Mbps during 20-40s; phase I again, and so on. => Are 20s really large enough? Sometimes TCP needs several seconds until it finds the available bandwidth. %%%%%% - The scenario consist of TCP NewReno flows between sender A and ----- + The scenario consists of TCP NewReno flows between sender A and %%%%%% behavior, the tester MUST compare its performance with those of drop- tail and SHOULD provide a reference document for their proposal => Isn't a comparison to drop-tail (and a buffer of size BDP) also relevant for the earlier described tests? => irrespective of wi-fi: for traffic load there are first a set of tests with different stable conditions and then a test with a transient condition. For RTT there is a test that evaluates AQM's behavior for different RTTs in Section 5.2. Is there any reason why several different bottleneck capacities are not considered? {Section 7.3} ------------- describes more general remarks that probably belong into an earlier section... theoretical analysis belongs to the AQM specification and thus this whole section should be probably better moved to [draft-ietf-aqm-recommendation] This document can include tests whether the theoretical analysis is "valid" in practice. {Section 8.1} ------------- Traffic mix = a mix of streams with different traffic profiles? %%%%%% - Webs pages download (such as detailed in Section 6.2); 1 CBR; 1 ---- + Webs pages download (such as detailed in Section 6.2); 1 CBR; 1 {Section 9.2} ------------- We recommend (2 times) => We should be avoided {Section 10.1} -------------- scheme on a particular hardware or software device. This also helps the WG understand which kind of devices can easily support the AQM and which cannot. => as already earlier commented: this document is hopefully useful beyond the WG... => this belongs more to the requirement for an AQM proposal and thus this section should be moved to [draft-ietf-aqm-recommendation], too {Section 11.1} -------------- Additionally, the safety of an AQM scheme is directly related to its stability under varying operating conditions such as varying traffic profiles and fluctuating network conditions, as described in Section 7. Operating conditions vary often and hence the AQM needs to remain stable under these conditions without the need for additional external tuning. If AQM parameters require tuning under => this could also be mentioned in/moved to section 7... A minimal number of control parameters minimizes the number of ways a *possibly naive* user can break a system where an AQM scheme is deployed at. => this sounds a little bit strange, so better remove *possibly naive* {Section 11.2} -------------- => Required discussion vs. Recommended discussion? => The first two paragraphs describe requirements for AQM proposals {Section 12} ------------ All previous tests could be performed with or without ECN... {Section 13.2} -------------- During the characterization process of a dropping policy, the *tester* MUST discuss the feasibility to add scheduling combined with the AQM algorithm. This is more the job of an AQM designer, not the tester... {Section 14.1} -------------- This should be discussed earlier in the document.... %%%%%% ascertain whether a specific AQM is not only better than drop-tail but also safe to deploy. Testers therefore need to provide a => better than drop-tail with a BDP-sized buffer. The buffer size alone is a parameter that affects performance of a CC scheme. {Section 14.3.1} ---------------- [bullet 1] For example, to compare how well a queue-length based AQM scheme controls queueing delay vs. a queueing-delay based AQM scheme, a tester can identify the parameters of the schemes that control queue delay and ensure that their input values are comparable. => It would be preferable if AQM designers described these parameters. Ideally, an AQM proposal could describe the parameters as a function of network characteristics such as capacity and average RTT, similar to how it is done in Sally Floyd's Adaptive RED paper. [bullet 2] In such situations, these schemes need to be compared over a range of input configurations. => From this text it can be inferred, that the goal is to run some/all scenarios in this document with different settings of an AQM parameter that affects delay/throughput tradeoff. This is probably very valuable, because the network administrator can choose the desired delay and then see what AQM provides better throughput for this value of the delay from the graphs described in Section 2.7. For this reason this paragraph is probably very important and should be moved to the beginning of the document together with the requirement to compare AQM against drop-tail. It would also be good if AQM document explicitly specified what parameters to tune (similar to how target in Codel affects power metric: see http://www.ietf.org/proceedings/84/slides/slides-84-tsvarea-4.pdf slides 17-19) {Section 20.2} -------------- [HAYE2013] Hayes, D., Ros, D., Andrew, L., and S. Floyd, "Common TCP Evaluation Suite", IRTF (Work-in-Progress) , 2013. is this referring to https://tools.ietf.org/html/draft-irtf-iccrg-tcpeval-01? This should be as precise as possible.
_______________________________________________ aqm mailing list [email protected] https://www.ietf.org/mailman/listinfo/aqm
