Dear Joel Thank you very much for the excellent review! Below we tried to resolve all your comments/issues! It will be great if you could inform us if you are satisfied with these solutions!
On 3/13/2010, "Joel M. Halpern" <[email protected]> wrote: >I have been selected as the General Area Review Team (Gen-ART) >reviewer for this draft (for background on Gen-ART, please see >http://www.alvestrand.no/ietf/gen/art/gen-art-FAQ.html). > >Please resolve these comments along with any other Last Call comments >you may receive. > >Document: draft-ietf-nsis-rmd-16.txt > RMD-QOSM - The Resource Management in Diffserv QOS Model >Reviewer: Joel M. Halpern >Review Date: 13-Mar-2010 >IETF LC End Date: 22-Mar-2010 >IESG Telechat date: N/A > >Summary: This document is not ready for publication as an Experimental RFC. > >Clarity Issue: > The document makes repeated use of the term Severe congestion. It >seems inevitable that a somewhat fuzzy definition will be used for that, >and I would not have concern about such fuzziness. However, the >definition used in the document, in section 2, presumably with the >understanding and agreement of the working group, is >"congestion that occurs when a node or link fails and the traffic is >rerouter through another node or link." This property (being caused by >node or link failure) has nothing to do with the severity of the >congestion. The text goes on to talk about this type of congestion not >be addressable via admission control. >It is possible that the document means severe congestion (in the more >conventional sense) with the added caveat that it is brought about by >failure. But that is not what the definition says. (If that is indeed >the intent, then clarifying the definition will suffice to resolve this >issue.) >Also, as a lesser matter, there are systems which do address / prevent >element failures from causing severe congestion by using admission >control, so the claim in the definition that it can not be addressed by >admission control is at best misleading. It requires very different >behaviors than RMD,so are presumably inapplicable to this situation. Georgios: We would like to change the severe congestion definition given in Section 2 as follows, From: Severe congestion: is a congestion that occurs when a node or link fails and the traffic is rerouter through another node or link. If no measures are taken than the node or the link can become severely congested and all traffic passing through the node or link will severely degrade in QoS. This type of congestion cannot be solved using admission control mechanisms. INTO: Severe congestion: Is the congestion situation on a particular link within the RMD domain where a significant increase in its real packet queue situation occurs, when due to a link failure re-routed traffic has to be supported by this particular link. A failure in a communication path, e.g., a router or a link causes the routing algorithms to adapt to this failure by changing the routing decisions to reflect changes in the topology and traffic volume. As a result, the re-routed traffic will follow a new path and link, which may result in severely overloaded nodes and links as they need for a long time to support more traffic than they are able to. > >Major issues: > Section 3.2.3 on applicability seems to state that although there are >Multiple RMD-QOSM schemes, none are mandatory to implement. And that a >domain must all use one scheme. I am not sure if "scheme: here refers >to this document as distinct from some other document, or refers to the >variations (such as reduced state, and two varieties of stateless) on >interior node behavior. If, as seems to be the case since the following >text defines 5 schemes, it is referring to the interior behavior >choices, it would seem that there needs to be a mandatory-to-implement >scheme in order for this document to promote interoperability rather >than fragmentation of the network. Georgios: We will do the following modifications in order to solve this issue: In Section 3.2.3 we will replace the existing paragraph that is associated with the above issue with the following paragraph: ------------------ A very important consideration on using RMD-QOSM is that within one RMD domain only one of the following RMD-QOSM schemes can be used at a time. Thus a RMD router can never process and use two different RMD-QOSM signaling schemes at the same time. However, all schemes MUST be implemented within one RMD domain. The operator of an RMD domain MUST pre-configure all the QNE edge nodes within one domain such that the <SCH> field included in the "PHR container", see Section 4.1.2 and the "PDR Container", see section 4.1.3, will use always the same value, such that within one RMD domain only one of the below described RMD-QOSM schemes can be used at a time. ---------------- Moreover, in Sections 4.1.2 and 4.1.3 we will include the new description of the <SCH> filed that will be included on the most right 3 bits of the second 32 bit payload word. The following text will be used: ------------------ In Section 4.1.2: -------------------- <SCH>: 3-bit. The <SCH> value that is used to specify which of the 6 RMD scenarios, see Section 3.2.3, MUST be used within the RMD domain. The operator of an RMD domain MUST pre-configure all the QNE edge nodes within one domain such that the <SCH> field included in the "PHR container", will use always the same value, such that within one RMD domain only one of the below described RMD-QOSM schemes can be used at a time. All the QNE interior nodes MUST interpret this field before processing any other PHR container payload fields. The currently defined <SCH> values are: o 0: RMD-QOSM scheme MUST be: "per flow congestion notification based on probing"; o 1: RMD-QOSM scheme MUST be: "per flow RMD NSIS measurement based admission control", o 2: RMD-QOSM scheme MUST be: "per flow RMD reservation based" in combination with "severe congestion handling by the RMD-QOSM refresh procedure"; o 3 : RMD-QOSM scheme MUST be: "per flow RMD reservation based" in combination with "severe congestion handling by proportional data packet marking" o 4: RMD-QOSM scheme MUST be: "per aggregate RMD reservation based" in combination with "severe congestion handling by the RMD- QOSM refresh procedure" o 5: RMD-QOSM scheme MUST be: "per aggregate RMD reservation based" in combination with "severe congestion handling by proportional data packet marking" o 6 7: reserved The default value of the <SCH> field SHOULD be set to the value equal to 3. ------------ In Section 4.1.3: -------------- <SCH>: 3-bit. The <SCH> value that is used to specify which of the 6 RMD scenarios, see Section 3.2.3, MUST be used within the RMD domain. The operator of an RMD domain MUST pre-configure all the QNE edge nodes within one domain such that the <SCH> field included in the "PDR container", will use always the same value, such that within one RMD domain only one of the below described RMD-QOSM schemes can be used at a time. All the QNE interior nodes MUST interpret this field before processing any other "PDR container" payload fields. The currently defined <SCH> values are: o 0: RMD-QOSM scheme MUST be: "per flow congestion notification based on probing"; o 1: RMD-QOSM scheme MUST be: "per flow RMD NSIS measurement based admission control", o 2: RMD-QOSM scheme MUST be: "per flow RMD reservation based" in combination with "severe congestion handling by the RMD-QOSM refresh procedure"; o 3 : RMD-QOSM scheme MUST be: "per flow RMD reservation based" in combination with "severe congestion handling by proportional data packet marking" o 4: RMD-QOSM scheme MUST be: "per aggregate RMD reservation based" in combination with "severe congestion handling by the RMD- QOSM refresh procedure" o 5: RMD-QOSM scheme MUST be: "per aggregate RMD reservation based" in combination with "severe congestion handling by proportional data packet marking" o 6 7: reserved The default value of the <SCH> field SHOULD be set to the value equal to 3. > > In this day and age, it seems surprising that the protocol specifies >that the interior messages are to be sent with no security. The IETF is >actively working to improve the security of intra-domain and >inter-domain routing, so this decision seems wrong. (Even for an >experiment.) (Section 4.4, 4th bullet.) At the very least, some >explanation of this choice is necessary. Georgios: You are right, the text needs to be clarified. What we meant is that RMD-QOSM relies mainly on the security and reliability support that is provided by the bound end-to-end session, which is running between the boundaries of the RMD domain (i.e., the RMD-QOSM QNE edges) and the security provided by the D-mode. We would want to change the specific bullet into: * When the QNE Ingress needs to send an intra-domain RESERVE message that is not an initial RESERVE, then the QoS-NSLP sends this message by including in the GIST API SendMessage primitive such attributes that the usage of the Datagram Mode is implied, e.g., Unreliable attribute. Furthermore the Local policy attribute is set such that GIST sends the intra-domain RESERVE message in a Q-mode even if there is a routing state at the QNE Ingress. In this way the GIST functionality uses its local policy to send the intra-domain RESERVE message by piggybacking it on a GIST DATA message and sending it in Q- mode even if there is a routing state for this session. The intra-domain RESERVE message is piggybacked on the GIST DATA message that is forwarded and processed by the QNE Interior nodes up to the QNE Egress. ------------------ Moreover, we will include the following paragraph at the introductory part of Section 4.4. ------------------ RMD-QOSM relies on the security and reliability support that is provided by the bound end-to-end session, which is running between the boundaries of the RMD domain (i.e., the RMD-QOSM QNE edges), and the security provided by the D-mode. ------------------- > > The text in section 4.1.2 states that the 8 bit overload % field >contains a real value. However, I could not find a description of the >encoding by which a real value between (between 0 and 1?) should be >encoded in the message. Georgios: Agree that the description of this filed is not clear and too many bits are used to represent this type of overload, while not needed. Therefore, we will do the following changes: In Section 4.1.2 we will do the following change. Change from: <Overload %>: 8 bits In case of severe congestion the level of overload is indicated by the Overload %. Overload % is the percentage of the measured PHB bit rate that is above the bit rate rate used to detect a severe congestion. Overload % SHOULD be higher than 0 if S bit is set. If overload in a node is greater than the overload in a previous node then Overload % SHOULD be updated. For more details see Section 4.6.1.6.1. Note that this field represents a real parameter. INTO: <Overload>: 1 bit. This field is used during the severe congestion handling scheme that is using the RMD-QOSM refresh procedure. This bit is set when an overload on a QNE interior node is detected and when this field is carried by the "PHR_Refresh_Update" container. <Overload> SHOULD be set to"1" if the <S> bit is set. For more details see Section 4.6.1.6.1. in Section 4.1.3 a similar change for this parameter will be applied: <Overload>: 1 bit. This field is used during the severe congestion handling scheme that is using the RMD-QOSM refresh procedure. This bit is set when an overload on a QNE interior node is detected and when this field is carried by the ""PDR_Congestion_Report" container. <Overload> SHOULD be set to"1" if the <S> bit is set. For more details see Section 4.6.1.6.1. ------------- This change on the <Overlaod> filed will be worked out in the rest of the text. > >Minor issues: > The measurement based admission control mechanism used here looks >remarkably similar to the classical RSVP Predicative service. Both of >these are based on the assumption that current measured characteristics >are an indicator of future load. It is not at all apparent that there >is any such relationship. It seems that the text ought to include some >indication as to what the basis of suggesting this be used is, and why >it is thought to be meaningful. Even if the argument is "it is worth >trying", it seems worth stating that, and stating why it is thought that >it will work now. Georgios: You are right about the fact that the measurement based admission control scheme can only support a predictive service. We would like to change the following paragraph in section 3.1 from: The measurement-based algorithm continuously measures traffic levels and the actual available resources, and admits flows whose resource needs are within what is available at the time of the request. INTO: The measurement-based algorithm continuously measures traffic levels and the actual available resources, and admits flows whose resource needs are within what is available at the time of the request. The measurement based algorithm is used to support a predictive service where the service commitment is somewhat less reliable than the service that can be supported by the reservation based method. A main assumption that is taken by such measurement based admission control mechanisms is that the aggregated PHB traffic passing through an RMD interior node is high and therefore, current measurement characteristics are considered to be an indicator of future load. > It would probably be helpful to explain why it is necessary or >desirable to use two different RESERVE messages across the same domain, >traversing the same set of devices, with different but closely related >information. (particularly in light of the comments about reducing load >on intermediate devices.) Georgios: You are right we will try to clarify this as follows: In Section 3.1 we would like to change the following text from: The basic RMD-QOSM/QoS-NSLP signaling is shown in Figure 3. The signalling scenarios are accomplished using the QoS-NSLP processing rules defined in [QoS-NSLP], in combination with the RMF triggers sent via the QoS-NSLP-RMF API described in [QoS-NSLP]. A RESERVE message is created by a QNI with an Initiator QSpec describing the reservation and forwarded along the path towards the QNR. INTO: "The basic RMD-QOSM/QoS-NSLP signaling is shown in Figure 3. The signalling scenarios are accomplished using the QoS-NSLP processing rules defined in [QoS-NSLP], in combination with the RMF triggers sent via the QoS-NSLP-RMF API described in [QoS-NSLP]. Due to the fact that within the RMD domain a different QoS model can be supported than the end-to-end QoS model applied at the edges of the RMD domain, the RMD interior node reduced state reservations can be updated independently of the per-flow end-to-end reservations, see Section 4.7 of [QoS-NSLP]. Therefore, two different RESERVE messages are used within the RMD domain. One RESERVE message that is associated with the per flow end-to-end reservations and is used by the edges of the RMD domain and one that is associated with the reduced state reservations within the RMD domain." > The applicability section states that this mechanism can only be used >with the EF DSCP. Is it further the case that it can only be used for >traffic which consistently uses a stable amount of bandwidth (per >reservation)? One of the difficulties with the style of reservation >based on measurement of load is that the end pointing requesting the >measurement must be aware of whether the measurement data includes the >flow being considered for admission. Otherwise, large flows can cause >significant confusion. With very stable flows, as long as the >measurements are not requested too often, this is achievable. >Otherwise, it is not at all clear to this reader how the proposed >mechanism would work (particularly when refreshing a reservation). > Continuing this line of questioning, the mechanism for modification >seems to send the new bandwidth through the stateless intra-domain >routers. Since they are stateless, those routers do now know what the >old reservation was. And the measurements presumably include traffic >under the old reservation. if these are added together, significant >double-counting woudl seem to occur. (This is listed as minor on the >premise that the protocol presumably actually works, and therefore the >problem is one of reader comprehension, rather than more serious >technical issues. Georgios: You are right that the descriptions are not clear. Please note that with the measurement based scheme the requested peak bandwidth of a flow is carried by the admission control request. The admission decision is considered as positive if the currently carried traffic, as characterized by the measured statistics, plus the requested resources for the new flow exceeds the system capacity with a probability smaller than a value alpha. Otherwise, the admission decision is negative. It is important to emphasize that due to the fact that the interior nodes are stateless, they do not store information of previous admission control requests. This could lead to a situation where the admission control accuracy is decreased when multiple simultaneous flows (sharing a common interior node) are requesting admission control simultaneously. By applying measuring techniques, see e.g., [JaSh97], [GrTs03], which are using current and past information on NSIS sessions that requested resources from an NSIS aware interior node, the decrease in admission control accuracy can be limited. Moreover, the RMD measurement based schemes described in this document do not use any refresh procedures, since these approaches are used in stateless nodes, see Section 4.6.1.3. In order to clarify the text we would like to do the following. The abstract description of the measurement based admission control mechanism given in Section 3.1 will be enhanced as follows: We will add the following paragraph in Section 3.1: It is important to emphasize that the RMD measurement based schemes described in this document do not use any refresh procedures, since these approaches are used in stateless nodes, see Section 4.6.1.3. With the measurement based scheme the requested peak bandwidth of a flow is carried by the admission control request. The admission decision is considered as positive if the currently carried traffic, as characterized by the measured statistics, plus the requested resources for the new flow exceeds the system capacity with a probability smaller than a value alpha. Otherwise, the admission decision is negative. It is important to emphasize that due to the fact that the interior nodes are stateless, they do not store information of previous admission control requests. This could lead to a situation where the admission control accuracy is decreased when multiple simultaneous flows (sharing a common interior node) are requesting admission control simultaneously. By applying measuring techniques, see e.g., [JaSh97], [GrTs03], which are using current and past information on NSIS sessions that requested resources from an NSIS aware interior node, the decrease in admission control accuracy can be limited." > > I was not able to understand the purpose or use of the K bit. I may >have missed it in the dense text. Assuming there is an explanation, a >pointer at the point where the bit is defined to the text which explains >its use would be a very good idea. Georgios: You are right. The use of the <K> bit is described in Section 4.6.1.5.2. The description of the <K> bit will be changed as follows: <K>: 1 bit. When set to "1" it indicates that the resources/bandwidth carried by a tearing RESERVE MUST NOT be released and the resources/bandwidth carried by a non tearing RESERVE MUST NOT be reserved/refreshed. For more details see Section 4.6.1.5.2. Best regards, Georgios > >Yours, >Joel M. Halpern > >Nits/editorial comments: _______________________________________________ Gen-art mailing list [email protected] https://www.ietf.org/mailman/listinfo/gen-art
