Re: [Lwip] WGLC review ofdraft-ietf-lwig-tcp-constrained-node-networks

Markku Kojo Tue, 10 Sep 2019 12:15:28 -0700

Dear Carles, all,

it took longer to find time to pass through the draft than I thought aweek or so ago. My apologies and thanks for your patience.

It looks fine to me with a few exceptions that still seem to need somework, I think.


Please see inline.

Cheers,

/Markku

On Wed, 5 Jun 2019, Carles Gomez Montenegro wrote:

Dear Markku,

Thank you very much for your comprehensive and detailed review of the
draft. Your constructive comments have been very useful to address issues
and improve the quality of the document. Our updates can be found in
revision -08.

Please find below our inline responses to your comments.

Hi all,

I have reviewed the -07 version of this draft for the WGLC.

The draft is very useful for many developpers using or considering of
using TCP in CNN scenarious. It has improved a lot from the previous
versions but there are still a number of issues worth addressing.

See comments below.

Best regards,

/Markku

Sec 4.1.

Title: Path properties
- Would it be better to use "Addressing path properties" or something
similar as TCP cannot (much) affect the properties?


Sounds good!

Sec. 4.1.1.

If I understand it correctly, the discussion in this section is intended
to be all about avoiding Path MTU discovery when running TCP oevr IPv6?
Therefore, "Avoiding Path MTU Discovery in IPv6" would probably be
a better title?

From our point of view, the section is actually about setting the MSS to a

suitable value, which then may help avoiding the need to support Path MTU
Discovery, but also the need to perform IP-layer fragmentation at the
source. We have explicitly added the latter in -08.

This seems fine now, except I noticed another problem that I didn't spotlast time. The text says a few of times something aboutlimiting/setting MTU, where you actually want to advice limiting IPdatagram size (by setting the MSS), just like you say above. But the textreads, for example:


 ... it may be desirable to limit the MTU to 1280 bytes ...

which I believe should read somethig like:

 ... it may be desirable to limit the IP datagram size to 1280 bytes ...

MTU is the property of the network link at hand, not controllable same wayas the IP datagram size which can be limited by setting the TCP MSS. And,if MTU was settable, avoiding framentation would call for setting it to aslarge value as possible, not limiting it.

In addition, to my understanding TCP implementations typically address
the presence of TCP options such that they eat the necessary space for
TCP options from the payload, not by increasing the IP datagram size
if TCP options are present. For example, if SMSS is set to, let's say
1460 octets, and a TCP sender adds a TCP timestamp option (12 bytes) it
will send only 1448 bytes of payload in a TCP segment?

What the draft now says in this respect is on the safe side, but it
might be overcautious. I don't remember any RFC saying how SMSS and
adding options to a TCP segment are related. Maybe someone of the TCP
implementors may shed more light to this how?


We have tried to address your two comments above. On this matter, we had
received feedback that this measure (advertising an MSS smaller than 1220
bytes) would be safe, but we have tried to reflect that this might not be
necessary, and even overcautious.


Seems fine, thanks.

The second but last para discussing IPv4 in this context is very
confusing. In particular, the 2nd sentence

  "In IPv4, the MTU is 576 bytes."

is simply incorrect. In IPv4 the requirement is that any host must be
able to accept datagrams of up to 576 octets, but there is no upper
limit of 576 for IPv4 MTU!


Indeed. The 2nd sentence missed the word ?minimum? before ?MTU?, and we
have made several updates to the paragraph.

Maybe I was unclear in my comment. There is no upper nor lower limit of576 for IPv4 MTU. Definitely MTU can be less than 576. Moreover,IPv4 requires that every node must be able to forward an IP packet of 68bytes without fragmentation, but even this is not exactly a minimum MTUrequiremt. It is because IP is not necessarily able to fragment packetssmaller than 68 bytes. In other words, there is no similar requirementfor link layers to support a certain minimum MTU with IPv4 as there isfor IPv6 that requires link layers to support an MTU of at least 1280bytes.Therefore, I think it is hard to give similar advise for IPv4 as thedraft gives for IPv6.

Sec. 4.1.2.

  "In such traffic patterns, it is more difficult to
   detect packet loss without retransmission timeouts ..."

->

  "In such traffic patterns, it is more difficult and often
   impossible to detect packet loss without retransmission timeouts
   unless ECN is enabled ..."


Done.


  "When the congestion window of a TCP sender has a
  size of one segment, the TCP sender resets the retransmit timer, and
  the sender will only be able to send a new packet when the retransmit
  timer expires [RFC3168]. Effectively, the TCP sender reduces at that
  moment its sending rate from 1 segment per Round Trip Time (RTT) to 1
  segment per RTO, which can result in a very low throughput.  In
  addition to better throughput, ECN can also help reducing latency and
  ECN can also help reducing latency and        jitter."

This text is somewhat inaccurate in terms of how ECN works if only a
single segment is in flight (cwnd = 1 MSS) and confusing when it says
"which can result in a very low throughput". The latter is kind a true,
but also necessary to avoid congestion and, after all, it may result in
higher throughput compared to the case where ECN is not used and
retransmissions are needed.

I'd rephrase the above to something along the lines:

  "When the congestion window of a TCP sender has a
  size of one segment and a TCP ACK with an ECN signal (ECE flag) arrives
  at the TCP sender, the TCP sender resets the retransmit timer, and
  the sender will only be able to send a new packet when the retransmit
  timer expires. Effectively, the TCP sender reduces at that
  moment its sending rate from 1 segment per Round Trip Time (RTT) to 1
  segment per RTO and reduces the sending rate further on each ECN signal
  received in subsequent TCP ACKs. Otherwise, if an ECN signal is not
  present in a subsequent TCP ACK the TCP sender resumes the normal
  ack-clocked transmission of segments [RFC 3168].


Thank you very much for the proposed text. The draft has been updated
accordingly.

It might be also good to rearrange the text in the second and third
paragraphs to disscuss the effect of retrasmission and timeouts more
coherently. I may suggest text for this.


We will welcome any suggestion you may have in this regard.

I'm fine with the text as is. It seems to me after all that rearrangingthe text is not crucial. A reader should get the intended message,although rewrite might help the reader.

Sec 4.2.

  "This section discusses TCP stacks that focus on transferring a single
   MSS."

Maybe better:

  "This section discusses TCP stacks that allow transferring only a single
   MSS at a time."


Done.

Sorry, but now I am a bit confused here with the new naming. Earlier thiswas titled "single MSS-windows and buffers" and "single-MSS stacks" andthe text indicated that such TCP stack reduces its maximum advertizedwindow to one MSS and may only hold one MSS of data in its send andreceive buffers. Further, the text indicated and still indicates thatsuch a stack may, however, have several TCP segments in flight as long asthey carry at most one MSS worth of data in total. So, my comments werebased on that assumption.

Now the new way of naming them as "single-segment stacks" hints that theseare even more restricted such that the stack has e.g., limited capabilityfor bookkeeping and may only hold a single TCP segment in its one MSS sendor receive buffer at a time regardless of the payload size of thesegment and thereby possibly have only(?) a single data segment inflight in each direction at any point of time. That is, once such a TCPsender has received some data from the application, it does not acceptmore data from the application until a cumulative Ack has arrived andreleased the send buffer for the next piece of app data. And such a TCPreceiver advertises zero window once it receives a data segment carryingany amount of data (e.g., only a single byte).

Or, there obviously can be different variants of these two behaviorsabove, for example uIP that releases the send buffer after transmittingthe segment and requires the app to provide the same data for possibleretransmission. Or, a receiving TCP always delivers the segmentimmediately to the application (and the receive buffer is possiblyprovided by the application).

So, what is exactly meant by a "single-segment stack"? Both variantsabove are possible, or various flavors of them, and possiblyboth types of stacks exist at least for the TCP send bufferimplementation? Single-segment reveive buffer implementations that keepthe segment in the TCP reveive buffer until an application receives itwould have hard time to work decently with a regular TCP stack in certainfully legitimate scenarios, though. Therefore, they possibly arenon-existing?

The definition affects the text in many parts of the draft as well mycomments. For now, I continue with my original interpretation and assumethe definition of "single-MSS buffers" hold. Otherwise, the split hackwould not be possible at all if the TCP sender is not able to composemore than one segment at a time and would result in really badperformance with a receiver stack that may accept only a single segmentat time, right?

Sec 4.2.1.

Last sentence:

  "For this use of CoAP, a maximum TCP window
   of one MSS will be sufficient."

This is not necessarily true. If both TCP stacks involved allow a TCP
window larger than 1 MSS and a CoAP request or response larger than one
MSS is in use, it can be delivered more efficiently than in case where
max TCP window is one MSS.


The above part of the comment is addressed in the new text, thank you.

Furthermore, this may also not be the case if a CoAP over TCP
application uses short-lived TCP connections. Why? Because then the
mandatory CSM message that each CoAP endpoint sends after the 3WHS may
introduce an additional RTT as it cannot necessarily be sent during the
same RTT with the first CoAP request/response. Of course, if the CSM
message and the first CoAP request/response message fit into a single
MSS and the TCP Nagle algorithm is disabled, such a single MSS window
does not result in an additional RTT.


... but this not?

It is true that a CoAP endpoint is not allowed to send a newapplication message until a response to the previous one has beenreceived. However, sending a CMS message is a mandatory artefactof CoAP over TCP and it is sent as the first msg over the TCPconnection and in both directions. This CMS message eats a part of the oneMSS window. The CoAP over TCP spec allows sending the first applicationmessage immediately after the CMS has been transmitted. However, a TCPwindow of one MSS may prevent TCP from transmitting the app msg untilthe TCP Ack for the TCP segment carrying CSM has arrived (depends on thesize of these msgs and the MSS). Therefore, the performance may degradenotably for apps running on a single-segment or single-MSS stack at eitherend and using short-lived TCP connections as they may need to first waitfor the TCP Ack for the CSM and only then can the first applicationmessage be transmitted. The need for waiting for the TCP Ack of CSMarises always if Nagle is enabled, and if Nagle is disabled it may arisedue to the lack of buffer space to hold both CSM and the first datasegment carrying the first application message.

Not quite sure how to address this nicely, since it is a smallCoAP over TCP detail that mostly affects short-lived TCP connections butdepends on the size of the first msgs and MSS (as well as Nagle). Thatis, it has an effect under certain conditions, not necessarily always. Andit may have a significant effect or only a negligible effect.

We have updated the sentence accordingly.

Sec. 4.2.2.

  "A TCP implementation for a constrained device that uses a single-MSS
  TCP receive or transmit window size may not benefit from supporting
  the following TCP options: Window scale [RFC7323], TCP Timestamps
  [RFC7323], Selective Acknowledgments (SACK) and SACK-Permitted
  [RFC2018]."

It may be useful to mention that a TCP sender can benefit from
Timestamps in detecting spurious RTOs that are quite likely to occur
in CNN scenarios.


Done.


Sec. 4.2.3.

2nd para:

  "A device that advertises a single-MSS receive window should avoid use
  of Delayed ACKs in order to avoid contributing unnecessary delay (of
  up to 500 ms) to the RTT [RFC5681], which limits the throughput and
  can increase the data delivery time."

This should not appear as a generic recommendation as it is not
correct for some typical usage scenarios such as request-response
traffic where the node with a single-MSS receive window is the server
sending the responses. With delayed ACKs it can biggyback the TCP ACK
with the response if the response is sent before the delayed ACK timer
expires, thus avoiding unnecessary pure TCP ACKs.

So, here, like in Sec 4.3.2., it is important to indicate that it depends
on the communication pattern whether delayed ACKs are useful or harmful.


We have modified this text accordingly.

3rd para:

  "A device that can send at most one MSS of data is significantly
  affected if the receiver uses Delayed ACKs, e.g., if a TCP server or
  receiver is outside the CNN."

Again, this does not hold in all cases. E.g., if the server is outside
of the CNN and request-response communication is used.


We have modified and reorganized the text accordingly.

The "split hack" is not advisable workaround. First for the reason
stated in the end of the para, but more importantly because it simply
does not necessarily even work; a TCP receiver is requited to
acknowledge every second full-sized segment, but not two consecutive
small segments.


Agreed. This comment has been incorporated into the text. We discuss the
?split hack?, but we do not recommend it.


Fine. May I suggest a minor additional tweak:

"A standard compliant TCP receiver will acknowledge the second MSS ofdata, ..."

->

"A standard compliant TCP receiver may immediately acknowledge the secondMSS of data, ..."

4th para:

  "Similar issues happen when a sender uses the Nagle algorithm.
  Disabling the algorithm will not have impact if the sender can only
  handle stop-and-wait operation."

Actually it does have an impact in some specific usage scenarios, e.g.,
with CoAP over TCP disabling the Nagle algorithm allows sending the
mandatory CSM message and the first CoAP  msg (request) and possibly
also CSM and the first response without unnecessarily waiting for a TCP
ACK of the CSM msg. This is of particular impact if short-lived TCP
connections are in use with CoAP over TCP.


The text above refers to stop-and-wait operation. In your example above,
were you considering stop-and-wait operation?

Yes, I was considering stop-and-wait operation at the application level inthe context of CoAP over TCP which happens to be a weird thing. Eventhough a single CoAP request msg may be issued at a time, thetransmission of the CSM message in the beginning of the TCP connectionbreaks the stop-and-wait operation.

But, I'm not anymore sure how to interpret the text. Where does "similarissues" now refer to?

It may be useful to consider moving the discussion on Nagle to Sec 4.2.1in the context of CoAP over TCP where it has an impact and not discuss ithere at all, because it is not that necessary to say here that "Nagle isno-op"?

Sec. 4.2.4.

RTO is not estimated but calculated using estimated RTT and deviation
from it. That is, modify:

RTO estimation -> RTO calculation


Done.

2nd para:
  "[RFC6298] describes the standard TCP RTO algorithm."

You may delete this sentence and cite RFC 6298 in the first sentence
of the Sec 4.2.4 where the algorithm is first mentioned.


Done.

3rd para:
  "As an example, an adaptive RTO algorithm for CoAP over UDP has been
  defined [I-D.ietf-core-cocoa] that has been found to perform  well in
  CNN scenarios [Commag]."

Maybe not a good idea to cite the current version of CoCoA RTO
algorithm (v3) that have been found also to have detrimental behavior?


Done.

Maybe I should not say this but possibly could cite here more than onealternative that have been experimentally shown to perform well ;)

Sec. 4.3.1.

  "Assuming that Delayed ACKs are used by the receiver, the
  mentioned algorithms work efficiently for window sizes of at least 5
  MSS: If in a given TCP transmission of segments 1, 2, 3, 4, 5, and 6
  the segment 2 gets lost, the sender should get an ACK for segment 1
  when 3 arrives and duplicate acknowledgements when 4, 5, and 6
  arrive. It will retransmit segment 2 when the third duplicate ACK
  arrives. In order to have segment 2, 3, 4, 5, and 6 sent, the window
  has to be at least 5 MSS. With an MSS of 1220 byte, a buffer of the
  size of 5 MSS would require 6100 bytes."

The requirement for the window size to be of at least 5 segments does
not hold if Limited Transmit is in use.


We have updated the text (including that now we mention ?up to 5 MSS?),
along with updates in the Limited Transmit paragraph.

Also, the requirement of at least 5 segments is valid only if the ACK
for the segment 1 was held by the DelAck timer, i.e., the requirement
holds approx. with 50% probability. That is, if the segment 1 got
acknowledged (because there was also a segment before segment 1 and
that was held by DelAck timer), only a window size of 4 MSS is needed.


We have modified the text accordingly.

I have similar comment as Ilpo here. It's worth clarifying the exampleby removing the potential interpretation that segments 1-6 can be inflight as a starting point with a cwnd of 5 MSS (assuming they all arefull-sized segments).

Note also that in some traffic scenarios where Nagle is disabled and a TCPsender does not send MSS-sized segments but smaller segments, it is quitepossible to levarage FR/FR even with window sizes smaller than 5 MSS(actually even with a window size of one MSS 3 dupacks and FR/FR may bepossible).

  "For bulk data transfers further TCP improvements may also be useful,
  such as limited transmit [RFC3042].

Limited Transmit is not useful only for bulk data transfers but for any
transfer that has more than one segment in flight. Small transfers tend
to benefit more, because they are more likely to not receive enough
dupacks.


We have modified the paragraph accordingly.

Fine, but it might be useful to start the 2nd para that discussesLimited Transmit by noting that the example in the 1st para assumedthat limited Transmit is not in use.

In addition, actually a cwnd allowing 2 segments in flight would beenough to trigger sending segments 1-5. Only difference to cwnd of 3segments is that one needs to wait for the DelAck timer to expire forsegment 1.

Sec. 4.3.1.1.

  "... a sender (having previously sent the SACK-Permitted
  option) can avoid performing unnecessary retransmissions, saving
  energy and bandwidth, as well as reducing latency."

It might be worth mentioning also that SACK often allows for faster
loss recovery when there is more than one lost segment in a window of
data (i.e., recovery with less RTTs).


Done.

Fine. I'd suggest editing the sentence slightly because it is notguaranteed that with SACK recovery takes less RTTs:


 "...since with SACK recovery requires less RTTs."

 -->

 "...since with SACK recovery may complete with less RTTs.

Sec. 4.3.2.

Disabling delayed ACKs on a client for infrequent request-response
traffic with small messages might be advisable, too. It would allow
an immediate ACK for the data segment carrying the response.

This comment holds for Sec. 4.2.3 as well.


Added, thanks.

The text here would benefit from more accurate expression of at which endthe delayed Acks should be turned on and where turned off (e.g., "sender"does not specify the end point as there is both app. level (data) senderas well as TCP sender at each end for request-response type of traffic.


Maybe something like:

 For request-response traffic, enabling Delayed ACKs is recommended,
 in order to allow combining a response with the ACK into
 a single segment, thus increasing efficiency.  In this case,
 disabling Delayed ACKs at the sender allows an immediate
 ACK for the data segment carrying the response.

 -->

 For request-response traffic, enabling Delayed ACKs is recommended at
 the server end, in order to allow combining a response with the ACK into
 a single  segment, thus increasing efficiency.  In addition, if
 a client issues requests infrequently, disabling Delayed
 ACKs at the client allows an immediate ACK for the data segment
 carrying the response.

Sec. 5.3.

  "A mean TCP NAT binding timeout of 386
   minutes has been reported, while in some cases, inactivity timeouts
   are in the order of a few minutes [HomeGateway].

Reporting just the mean TCP NAT binding timeout from [HomeGateway] does
not give a correct view of the results in this study, because the
meaasured timeouts were highly variable and some devices had a very long
timeout (or no timeout at all), yielding a very high mean timeout value.
Therefore, we reported median and it would be more descriptive to report
it here as well. The median of the measured TCP NAT binding timeouts in
this study was around 60 mins, the shortest being around 2 mins. That is,
clearly more than 50% of the devices had timeout shorter than RFC
5382 recommended minimum of 124 mins.

In the light of these results, it may be hard to find a proper timeout
value for the application-layer heartbeat messages, and it might be
worth mentioning, I think.


We have updated the section based on these comments.

Nits:


All done, except for the ?Bit-Error Rate? suggestion. The Collins English
dictionary uses the non-hyphenated form.

Sec 1:

1st para: Add references and cite  6LoWPAN, RPL, and CoAP.


3rd para: "At the application layer, CoAP was developed over UDP
[RFC7252]."

- this seems to cite UDP incorrectly while the intent is to cite CoAP.
   If you cite CoAP in the first para, you do not need to cite here at
all.

  "This the main reason..." -> "This is the main reason..."


5th para:

  "Given the limited resources on constrained devices, careful "tuning"
   of the TCP implementation can make an implementation more lightweight."

Instead of saying "tuning" of the TCP implementation, I'd say that careful
selection of optional TCP features can make an implementation more
lightweight (and improve operation in CNNs).

6th para:

  "This document provides guidance on how to implement and use TCP in
   CNNs.

->

  "This document provides guidance on how to implement and configure TCP
   as well as how TCP is advisable to be used by applications in CNNs.


Sec 3.1., last para:

  high bit error rate ->  high bit-error rate


Sec 5., first sentence:

  "how a TCP stack can be used" ->  "how TCP can be used"


Once again, thank you very much for your comprehensive review and
constructive suggestions!

Cheers,

Carles (on behalf of all authors)


_______________________________________________
Lwip mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lwip

Re: [Lwip] WGLC review ofdraft-ietf-lwig-tcp-constrained-node-networks

Reply via email to