Re: [6lo] Mirja Kühlewind's Discuss on draft-ietf-6lo-fragment-recovery-13: (with DISCUSS and COMMENT)

Mirja Kuehlewind Thu, 19 Mar 2020 02:31:27 -0700

Hi Pascal,

Thanks for you updates. Sorry for my late reply. I unfortunately have some more 
comments. Please see below.


> On 6. Mar 2020, at 19:57, Pascal Thubert (pthubert) <pthub...@cisco.com> 
> wrote:
> 
> Hello Mirja
> 
> A great many thanks for your really deep review, this is both really 
> appreciated and  incredibly useful.
> 
> If that's OK with you let's make a round to clear the DISCUSSes separately 
> like I did for Benjamin's review.
> 
> Also considering the breadth of the discuss alone, I'd rather publish the 
> proposed changes so you and Benjamin can review the changes I made for your 
> DISCUSSes.
> 
> Please find the proposed changes discussed below in 
> https://www.ietf.org/rfcdiff?url2=draft-ietf-6lo-fragment-recovery-14 
> 
>> 
>> ----------------------------------------------------------------------
>> DISCUSS:
>> ----------------------------------------------------------------------
>> 
>> Thanks for this well written document, however, I have a couple points
>> below that need further clarification, all mostly related to congestion
>> control. From an editorial point of view most of this is discussed either in 
>> the
>> intro text of section 6, then some part in 7.1, and some in the appendix C. I
>> would really recommend you to instead have a separate section that much
>> clearer states what should be done by default (probably no dynamically
>> window but a small fixed window with maybe size of 1) and what could be
>> don as further optimisation, and also to discuss the parameter/variables
>> there before the algorithms are discussed.
>> 
> 
> A size of 1 is probably not acceptable for the general LLN case, considering 
> the cost of an ack. I'd rather leave that to config.
> Otherwise, I agree. What about adding a subsection in section 4 as follows 
> (including changes that cover your comments below):
> 
> "
> 4.3.  Flow Control
> 
>   The inter-frame gap is the only protection that [FRAG-FWD] imposes by
>   default.  This document enables to group fragments in windows and
>   request intermediate acknowledgements so the number of in-flight
>   fragments can be bounded.  This document also adds an ECN mechanism
>   that can be used to adapt the size of the window, the size of the
>   fragments, and/or the inter-frame gap to protect the network.
> 
>   This specification enables the source endpoint to apply a flow
>   control mechanism to tune those parameters, but the mechanism itself
>   is out of scope.  In most cases, the expectation is that most
>   datagrams will represent only a few fragments, and that only the last
>   fragment will be acknowledged.  A basic implementation of the source
>   endpoint is NOT REQUIRED to variate the size of the window, the
>   duration of the inter-frame gap or the size of a fragment in the
>   middle of the transmission of a datagram, and it MAY ignore the ECN
>   signal or simply reset the window to 1 (see Appendix C for more) till
>   the end of this datagram upon detecting a congestion.
> 
>   The size of the fragments is typically computed from the Link MTU to
>   maximize the size of the resulting frames.  The size of the window
>   and the duration of the inter-frame gap SHOULD be configurable, to
>   roughly adapt the size of the window to the number of hops in an
>   average path, and to follow the general recommendations in
>   [FRAG-FWD], respectively.
> “
> 
Thanks for adding this. However, as I said a couple of times in my discuss 
there must be more guidance. This is not only about flow control but also about 
congestion control and it is not okay to declare congestion control out of 
scope. If you only do fragmentation but no retransmission, you don’t need to 
care about congestion control (but only flow control) as you don’t increase the 
actual network load by this. However, if you retransmit you are sending more 
data than the original sender (that hopefully is congestion controlled) and 
therefore you increase the load on the network and you MUST implement your own 
congestion control or some fixed rate limiting for that additional load. Saying 
this is out of scope and you want to do experimentation is not acceptable for a 
Proposed Standard.

To be clear the request of this discuss is to give a normative recommendation 
for the default value of the window size and/or inter-frame gap.

Further note, as you allow to adapt both the window and the inter-frame gap 
dynamically, you actually implement two control mechanisms here. I actually 
recommend to only use the inter-frame gap and don’t have window here. You say a 
couple of times in your reply below, that the window determines the ask-rate, 
however, it is not clear to me because the ack rate should be a parameter at 
the receiver and not at the sender (maybe I don’t remember things correctly 
because it’s a while back since I read the draft and I couple find anything 
about this in the draft quickly). If the window size however does define the 
ack rate, then maybe you should rename that parameter respectively.

However, if there is really a need for a window, I still recommend to talk less 
about adapting this value dynamically and make clear that having a fixed value 
is the recommended default. Therefore I recommend to remove the parameter 
MinWindowSize and MaxWindowSize because these should actually not be parameters 
than can be configured but are actual bounds. If someone decides to implement 
dynamic window adaption, they can decide to re-introduce these parameter again 
and make them configurable but it doesn’t need to be part of this spec.

So it could be something like:

"Window_Size:  Window_Size MUST be at least 1 and less than 33. If the 
inter-frame gap is selected large enough to not overload the path and the 
one-way delay is known, the Window_Size SHOULD be set to the one-way trip time 
divided by the inter-frame gap.  Otherwise a small value of X SHOULD be 
configured. Note that the Window_Size determines the ack rate. If the 
window_size is set this to 32, this means only the last Fragment is 
acknowledged in the first round. If it is set to a smaller value, more acks are 
generated but the load on the forward path will be lower. Window_Size MAY be 
adapted dynamically to reduce load on the forward path in case of congestion.”

Still you also need to say more about how to set and dynamically adapt the 
inter-frame gap because that is probably the real limiting fact to avoid 
network overload.

Also below you remove the recommendation for using the number of hops as window 
size but here you added it again. This is just incorrect. There is no 
dependency between the number of hops and the window size: If there is no 
bottleneck on the path, you can just send with line rate at the sender. If 
there is a bottleneck on the path and you send at a higher rate than the 
bottleneck than soon or later the buffer at that hop will fill up completely. 
So the window size depends only on the bottleneck rate and end-to-end delay 
(BDP) (which let’s you calculate the number of packet in flight) plus the 
buffer size at the bottleneck. The number of hops is irrelevant.

Mirja


> 
>> And a bit of a provoking question: wouldn't it be easier to just use a 
>> reliable
>> transport protocol on top?
> 
> Just that the classical transports I'm aware of will :
> - not support the interframe gap and that's the basic requirement in 6lo 
> - not be capable to variate the interframe gap nor the fragment size
> - use the return path excessively for acks. 6lo is very much about saving 
> energy and bandwidth
> - be generally Overkill/too complex for a LLN node, see the text we just added
> 
> Also this spec enables a flow control mechanism but that mechanism is out of 
> scope. 
> It is internal to the sending endpoint and does not affect the 
> interoperability that this specification enables.
> A bit like there's an ECN in IP but the behavior belongs to the various 
> transport protocols. 
> Just that on top of signaling ECN we provide tools that the flow control 
> mechanism may play with, e.g., window size
> 
>> If this mechanism is intended to be used over a
>> short path with a few hops only (in a local network), I think this should be
>> stated more clearly at the beginning of the document. 
> 
> This is very true and implicit since we are talking about a contiguous 
> 6LoWPAN route-over mesh.
> Propose to tweak the last sentence in the introduction to
> "
>   This specification provides a method to forward fragments over
>   typically a few hops in a route-over 6LoWPAN mesh, and a selective
>   acknowledgment to recover individual fragments between 6LoWPAN
>   endpoints.  The method is designed to limit congestion loss in the
>   network and addresses the requirements that are detailed in
>   Appendix B.
> 
> "
> 
> 
>> In the appendix you state
>> this: " In addition, deploying such a mechanism requires
>>   that the end-to-end transport is aware of the delivery properties of
>>   the underlying LLN,..."
>> But I'm not sure what you mean...? Can you further explain?
> 
> "Requires" might be exaggerated since TCP was shown to work fine in LLNs. 
> But things like the default RTO of 1s is really unsafe, for an endpoint in 
> the internet that communicates to the LLN device (e.g., using HTTP instead of 
> COAP).
> But it's  probably better to just remove that text. Instead it is probably 
> good to mention the extra acks on the return path.
> For one thing, though, we do not want to discard packets that traversed the 
> LLN to indicate congestion to the source. Ideally things like slow start 
> should be really smooth, and the window size should remain very small to cope 
> with the memory available in the LN node without the need to drop a packet in 
> the LLN.
> 
> The end of appendix A becomes:
> 
> "
> 
>   Mechanisms such as TCP or application-layer segmentation could be
>   used to support end-to-end reliable transport.  One option to support
>   bulk data transfer over a frame-size-constrained LLN is to set the
>   Maximum Segment Size to fit within the link maximum frame size.
>   Doing so, however, can add significant header overhead to each
>   802.15.4 frame and cause extraneous acknowledgements across the LLN
>   compared to the method in this specification.
> 
> "
> 
>> 
>> 1) Sec 6:
>> "Upon exhaustion of the retries the
>>   sender may either abort the transmission of the datagram or retry the
>>   datagram from the first fragment with an 'X' flag set in order to
>>   reestablish a path and discover which fragments were received over
>>   the old path in the acknowledgment bitmap. "
>> I'm not sure about this "or". Why should the first fragment be more
>> successful than any other which requests an ACK? Also if you really want to
>> keep this condition, you need to specify it better. How often do you retry? I
>> guess you need to set the PTO again...? Further the RTO should also
>> implement an exponential back-off.
> 
> The first fragment draws a new path so it may avoid the problem though there 
> is no guarantee. 
> Once the new path is established, the next fragments will follow it and the 
> segments of the old path that are no more used time out.
> 
> Proposed updated text:
> 
> "
>   This automatic repeat request (ARQ) process MUST be protected by a
>   Retransmission Time Out (RTO) timer, and the fragment that carries
>   the 'X' flag MAY be retried upon a time out for a configurable number
>   of times (see Section 7.1) with an exponential backoff.  Upon
>   exhaustion of the retries the sender may either abort the
>   transmission of the datagram or resend the first fragment with an 'X'
>   flag set in order to establish a new path for the datagram and obtain
>   the list of fragments that were received over the old path in the
>   acknowledgment bitmap.
> 
> "
> 
> 
>> 
>> 2) sec 6.3:
>> "Upon an acknowledgment with a NULL bitmap, the sender endpoint
>>   MUST abort the transmission of the fragmented datagram with one
>>   exception: In the particular case of the first fragment, it MAY
>>   decide to retry via an alternate next hop instead."
>> What's mean with "In the particular case of the first fragment"? And does
>> this mean it should retry only with the first fragment or the whole
>> transmission. 
>> However, if this signal is from the receiving endpoint why should that
>> endpoint change it mind only if a different path is used? If the assumption 
>> is
>> that this NULL bitmap is sent by an intermediate node? However, then it
>> would make sense to  rather signal this information explicitly (e.g. using a
>> flag).
> 
> This is also linked to the fact that the first fragment draws the path as in 
> the case above. As you figured, the expectation is that a node in the middle 
> experiences an issue and cannot do the FF operation for that datagram, so it 
> aborts with a NULL bitmap.
> 
> Yes, the problem could be in the receiving endpoint in which case rerouting 
> does not help. But then, it is probably temporary, e.g., if the receiving 
> endpoint has a single reassembly buffer, which is quite common, and is 
> already receiving a datagram from another source. There is a variety of use 
> cases and which is most probable depends on the use case. 
> 
> So let the source endpoint decide. 
> 
> 
> 
>> 3) Sec 7.1 (and to some extend sec 6)
>> "   OptWindowSize:  The OptWindowSize is the value for the Window_Size
>>      that the sender should use to start with.  It is greater than or
>>      equal to MinWindowSize.  It is less than or equal to
>>      MaxWindowSize.  The Window_Size should be maintained below the
>>      number of hops in the path of the fragment to avoid stacking
>>      fragments at the bottleneck on the path.  If an inter-frame gap is
>>      used to avoid interference between fragments then the Window_Size
>>      should be at most on the order of the estimation of the trip time
>>      divided by the inter-frame gap."
>> This needs normative language and more explanation. 
> 
> Well, this was not intended to be normative but just a rule of a thumb. Some 
> (many I expect) people will want to ack only the last fragment and that's a 
> tradeoff between the cost of the ack back and the chances of congestion loss. 
> 
> 
>> I recommend to even
>> say that if no congestion control (as discussed in the appendix) is applied, 
>> the
>> Window MUST be set to 1. 
> 
> This makes full sense in other cases but is too expensive here. The default 
> that people go for is a single ack in the end. Note that this spec compares 
> to the art of RFC 4944 where all the fragments are pushed to the network 
> without any feedback as hot potatoes. Apparently that did noes work too well 
> in cases, thus this work. But people still love the fact that there's no 
> traffic back and that is why this original work 
> (https://tools.ietf.org/html/draft-thubert-6lowpan-simple-fragment-recovery-01)
>  was split 3 docs, this,  the overarching minimal-fragments and the LWIG 
> draft that forwards fragments in the RFC 4944 and no ack.
> 
> 
> 
>> Further, the assumption that the window can or
>> should be set to (at maximum) the number of hop does seem correctly to
>> me. No matter how many hops there are packets are only queued at the
>> bottleneck (the link where the current rate is smaller than the sending rate)
>> and it depends on the sending rate of the bottleneck link how many packets
>> need to be queued. This is completely independent of the number of hops.
> 
> The rationale here is due to the inter-frame gap. In normal conditions, it 
> ensures that a fragment progresses before the next comes in. So there's at 
> most one fragment per hop and there's no point having a window bigger that. 
> There's less than that actually, because frag 0 reaches node 2 before node 0 
> can send frag 1 to node 1, so we could divide the recommendation by 2. But 
> then we need to keep the network busy while the ack comes back. 
> 
> I agree that this is not providing the optimal window but more like a 
> reasonable upper bound. If there was no interframe gap I'd say (please 
> correct me) that the lower bound to keep the bottleneck busy (in fragments 
> not bytes) is (Bottleneck Speed / Bottleneck MTU) * RTT.  The average low 
> power network is mostly sleeping, and does not  experience congestion in 
> normal operation. So there is usually no bottleneck. If  the LLN is 
> homogeneous we get a lower bound of  (PHY Speed / MTU) * RTT. But then, some 
> people could be conservative and use a window of 1 as you recommend.
> 
> All in all it appears that the text creates more confusion then help, and 
> dives into the sender flow control which is out of scope.
> I'd rather remove recommendation on the runtime window size at all.
> 
> So we'd get:
> "
>   OptWindowSize:  The OptWindowSize is the value for the Window_Size
>      that the sender should use to start with.  It is greater than or
>      equal to MinWindowSize.  It is less than or equal to
>      MaxWindowSize.  A rule of a thumb for OptWindowSize could be an
>      estimation of the trip time divided by the inter-frame gap to keep
>      the network busy.
> "
> 
>> Further, even if that would be true, as long as this document does not 
>> discuss
>> also away to estimate or know the number of hops, this advise would
>> unfortunately be useless... 
> 
> Yes, better remove it
> 
>> Further I don't think pointing to rfc6298 for RTT
>> calculation is sufficient (as done in the appendix). rfc6298 assume frequent
>> ACKs and a reasonably large window, which is both not the case here. All in
> 
> TCP has been successfully used on LLNs, though, so it cannot be that bad a 
> recommendation.
> Note that there's probably less fragments than hops, so there's probably not 
> a chance to even measure RTT before all fragments are out, and if tehre is, 
> not many chances to update the initial reading till the datagram is fully 
> sent. I agree there may be better ways so we need to remove the RECOMMEND. 
> What about:
> 
> "
>                                                        For the lack of a more 
> adapted
>   technique, the method detailed in "Computing TCP's Retransmission
>   Timer" [RFC6298] may be used for that computation. 
> 
> "
> 
> Earlier in 6.0 I also suggest to change:
> 
> "
>                                                                               
>                        The sender
>   protects the transmission over the LLN mesh with a retry timer that
>   is configured for a use case and may be adapted dynamically, e.g.,
>   according to the method detailed in [RFC6298].  
> 
> "
> 
> 
>> all, any window adjustments itself are not described at all. What should be
>> done when a congestion marking is received? How does the window need to
>> be adjusted based on an RTO? When should the window be increased again?
>> And how much?
> 
> This is out of scope.
> 
> The goal of the draft is to specify what goes over the air to recover 
> fragments. The flow control operation is an internal decision to the sender 
> endpoint, and can be adapted to the use case an interoperation issue with the 
> other endpoints. We do not have enough experience to enforce something, and 
> there can be very different use cases and variations, so we only provide 
> non-normative hints.
> 
> We want to allow implementations to try their own stuff, including slow start 
> and fast recovery for a  device that can afford it in a use case that 
> justifies it. I hope we'll see future spec(s) that specify flow control 
> mechanisms, but as I said earlier, this is out of scope here, we just provide 
> the controls.
> 
> Following your earlier recommendation, we could suggest in case of a ECN we 
> set  W=1 and stay there as a rule of a thumb for that datagram in the absence 
> of a more intelligent / adapted flow control operation
> 
> To clarify let me change
> "
> 4.  Extending draft-ietf-6lo-minimal-fragment
> 
>   This specification implements the generic FF technique defined in
>   "LLN Minimal Fragment Forwarding" [FRAG-FWD], provides end-to-end
>   fragment recovery and mechanisms that can be used for flow control.
> 
> 
> "
> 
> Also see the new section 4.3
> 
> 
>> 
>> 4) Sec 7.1.: Inline with the TSV-ART review (Thanks Collin!), the parameters
>> need more guidance. Especially for he number of retries it should be possible
>> to recommend a default value (e.g. 3) and it would be good to also give an
>> upper limits (MUST NOT be larger than X). Similar for the window size: there
>> should be also at least a default value (see comment above). And further the
>> RTO needs further explanation about how to find a reasonable value. If the
>> RTO is configured (and not estimated dynamically) e.g. it could be set to 3x
>> the maximum expected RTT in the respective network. And it would be even
>> better to provide a minimum default (initial) value. Not that TCP is also
>> designed to work on a large variety of timescales and a minimum initial value
>> of 1s is seen as safe for all Internet scenarios. It's really important to 
>> also
>> provide some recommendations like this here.
> 
> Makes sense. 
> 
> The number of retries is really bounded by the upper layer protocol. 
> 
> It is actually the time allowed to transfer the datagram that must remain 
> below whatever the upper layer protocol expects.
> We could give a rule of a thumb and yes your 3 looks good. 
> 
> The window size can be anything, I expect many will only ask for an ack the 
> last fragment. 
> But by construction that is bounded by the bitmap to 32. 
> 
> All in all we get:
> 
> "
>   MinWindowSize:  The minimum value of Window_Size that the sender can
>      use.  A value of 1 is RECOMMENDED.
> 
>   OptWindowSize:  The OptWindowSize is the value for the Window_Size
>      that the sender should use to start with.  It is greater than or
>      equal to MinWindowSize.  It is less than or equal to
>      MaxWindowSize.  A rule of a thumb for OptWindowSize could be an
>      estimation of the one-way trip time divided by the inter-frame
>      gap.  If the acknowledgement back is too costly, it is possible to
>      set this to 32, meaning that only the last Fragment is
>      acknowledged in the first round.
> 
>   MaxWindowSize:  The maximum value of Window_Size that the sender can
>      use.  The value MUST be strictly less than 33.
> 
>   An implementation may perform its estimate of the RTO or use a
>   configured one.  The ARQ process is controlled by the following
>   parameters:
> 
>   MinARQTimeOut:  The minimum amount of time a node should wait for an
>      RFRAG Acknowledgment before it takes the next action.  It MUST be
>      more than the maximum expected round-trip time in the respective
>      network.
> 
>   OptARQTimeOut:  The initial value of the RTO, which is the amount of
>      time that a sender should wait for an RFRAG Acknowledgment before
>      it takes the next action.  It is greater than or equal to
>      MinARQTimeOut.  It is less than or equal to MaxARQTimeOut.  See
>      Appendix C for recommendations on computing the round-trip time.
>      By default a value of 3 times the maximum expected round-trip
>      time in the respective network is RECOMMENDED.
> 
>   MaxARQTimeOut:  The maximum amount of time a node should wait for the
>      RFRAG Acknowledgment before it takes the next action.  It must
>      cover the longest expected round-trip time, and be several times
>      less than the time-out that covers the recomposition buffer at the
>      receiver, which is typically on the order of the minute.  An upper
>      bound can be estimated to ensure that the datagram is either fully
>      transmitted or dropped before an upper layer decides to retry it.
> 
>   MaxFragRetries:  The maximum number of retries for a particular
>      fragment.  A default value of 3 is RECOMMENDED.  An upper bound
>      can be estimated to ensure that the datagram is either fully
>      transmitted or dropped before an upper layer decides to retry it.
> 
>   MaxDatagramRetries:  The maximum number of retries from scratch for a
>      particular datagram.  A default value of 1 is RECOMMENDED.  An
>      upper bound can be estimated to ensure that the datagram is either
>      fully transmitted or dropped before an upper layer decides to
>      retry it.
> 
> 
> 
> 
> "
> 
>> 
>> 5) Sec 7.2:
>> "The management system should monitor the number of retries and of ECN
>>   settings that can be observed from the perspective of both the sender
>>   and the receiver, and may tune the optimum size of Fragment_Size and
>>   of Window_Size, OptFragmentSize, and OptWindowSize, respectively, at
>>   the sender."
>> This does not see seem correct, as OptFragmentSize and OptWindowSize are
>> the initial values which are configured and therefore should not be changed
>> dynamically. Only Fragment_Size and Window_Size are changes. Further the
>> network should also normatively state somewhere that Fragment_Size and
>> Window_Size MUST not grow above the configured max value. That seems
>> obvious but it's better to be explicit and use normative language 
>> respectively.
> 
> This is meant to change the starting values to be applied to the next 
> datagrams.
> Note that talking to the management system can take a very long time. 
> We do not expect the kind of reactivity that would affect the current 
> datagram.
> 
> Proposed changes:
> "
> 7.1.  Protocol Parameters
> 
>   The management system SHOULD be capable of providing the parameters
>   listed in this section and an implementation MUST abide by those
>   parameters and in particular never exceed the minimum and maximum
>   configured boundaries.
> 
> "
> 
> And
> 
> "
> 7.2.  Observing the network
> 
>   The management system should monitor the number of retries and of ECN
>   settings that can be observed from the perspective of both the sender
>   and the receiver with regards to the other endpoint.  It may then
>   tune the optimum size of Fragment_Size and of Window_Size,
>   OptFragmentSize, and OptWindowSize, respectively, at the sender
>   towards a particular receiver, applicable to the next datagrams.
> "
> 
> 
> 
>> 6) Further sec 7.2 says:
>> "The inter-frame gap is another tool that can be
>>   used to increase the spacing between fragments of the same datagram
>>   and reduce the ratio of time when a particular intermediate node
>>   holds a fragment of that datagram."
>> However, inter-frame gap is a configuration parameter and this is the first
>> time that adapting it dynamically is mentioned here. If you want to adapt it
>> dynamically you need to add more information.
> 
> This is now discussed in 4.3 . But not in great details but then the flow 
> control mechanism is out of scope.
> 
> Again many thanks!
> 
> Pascal
> 

_______________________________________________
6lo mailing list
6lo@ietf.org
https://www.ietf.org/mailman/listinfo/6lo

Re: [6lo] Mirja Kühlewind's Discuss on draft-ietf-6lo-fragment-recovery-13: (with DISCUSS and COMMENT)

Reply via email to