Dear Carles, all,
it took longer to find time to pass through the draft than I thought a
week or so ago. My apologies and thanks for your patience.
It looks fine to me with a few exceptions that still seem to need some
work, I think.
Please see inline.
Cheers,
/Markku
On Wed, 5 Jun 2019, Carles Gomez Montenegro wrote:
Dear Markku,
Thank you very much for your comprehensive and detailed review of the
draft. Your constructive comments have been very useful to address issues
and improve the quality of the document. Our updates can be found in
revision -08.
Please find below our inline responses to your comments.
Hi all,
I have reviewed the -07 version of this draft for the WGLC.
The draft is very useful for many developpers using or considering of
using TCP in CNN scenarious. It has improved a lot from the previous
versions but there are still a number of issues worth addressing.
See comments below.
Best regards,
/Markku
Sec 4.1.
Title: Path properties
- Would it be better to use "Addressing path properties" or something
similar as TCP cannot (much) affect the properties?
Sounds good!
Sec. 4.1.1.
If I understand it correctly, the discussion in this section is intended
to be all about avoiding Path MTU discovery when running TCP oevr IPv6?
Therefore, "Avoiding Path MTU Discovery in IPv6" would probably be
a better title?
From our point of view, the section is actually about setting the MSS to a
suitable value, which then may help avoiding the need to support Path MTU
Discovery, but also the need to perform IP-layer fragmentation at the
source. We have explicitly added the latter in -08.
This seems fine now, except I noticed another problem that I didn't spot
last time. The text says a few of times something about
limiting/setting MTU, where you actually want to advice limiting IP
datagram size (by setting the MSS), just like you say above. But the text
reads, for example:
... it may be desirable to limit the MTU to 1280 bytes ...
which I believe should read somethig like:
... it may be desirable to limit the IP datagram size to 1280 bytes ...
MTU is the property of the network link at hand, not controllable same way
as the IP datagram size which can be limited by setting the TCP MSS. And,
if MTU was settable, avoiding framentation would call for setting it to as
large value as possible, not limiting it.
In addition, to my understanding TCP implementations typically address
the presence of TCP options such that they eat the necessary space for
TCP options from the payload, not by increasing the IP datagram size
if TCP options are present. For example, if SMSS is set to, let's say
1460 octets, and a TCP sender adds a TCP timestamp option (12 bytes) it
will send only 1448 bytes of payload in a TCP segment?
What the draft now says in this respect is on the safe side, but it
might be overcautious. I don't remember any RFC saying how SMSS and
adding options to a TCP segment are related. Maybe someone of the TCP
implementors may shed more light to this how?
We have tried to address your two comments above. On this matter, we had
received feedback that this measure (advertising an MSS smaller than 1220
bytes) would be safe, but we have tried to reflect that this might not be
necessary, and even overcautious.
Seems fine, thanks.
The second but last para discussing IPv4 in this context is very
confusing. In particular, the 2nd sentence
"In IPv4, the MTU is 576 bytes."
is simply incorrect. In IPv4 the requirement is that any host must be
able to accept datagrams of up to 576 octets, but there is no upper
limit of 576 for IPv4 MTU!
Indeed. The 2nd sentence missed the word ?minimum? before ?MTU?, and we
have made several updates to the paragraph.
Maybe I was unclear in my comment. There is no upper nor lower limit of
576 for IPv4 MTU. Definitely MTU can be less than 576. Moreover,
IPv4 requires that every node must be able to forward an IP packet of 68
bytes without fragmentation, but even this is not exactly a minimum MTU
requiremt. It is because IP is not necessarily able to fragment packets
smaller than 68 bytes. In other words, there is no similar requirement
for link layers to support a certain minimum MTU with IPv4 as there is
for IPv6 that requires link layers to support an MTU of at least 1280
bytes.
Therefore, I think it is hard to give similar advise for IPv4 as the
draft gives for IPv6.
Sec. 4.1.2.
"In such traffic patterns, it is more difficult to
detect packet loss without retransmission timeouts ..."
->
"In such traffic patterns, it is more difficult and often
impossible to detect packet loss without retransmission timeouts
unless ECN is enabled ..."
Done.
"When the congestion window of a TCP sender has a
size of one segment, the TCP sender resets the retransmit timer, and
the sender will only be able to send a new packet when the retransmit
timer expires [RFC3168]. Effectively, the TCP sender reduces at that
moment its sending rate from 1 segment per Round Trip Time (RTT) to 1
segment per RTO, which can result in a very low throughput. In
addition to better throughput, ECN can also help reducing latency and
ECN can also help reducing latency and jitter."
This text is somewhat inaccurate in terms of how ECN works if only a
single segment is in flight (cwnd = 1 MSS) and confusing when it says
"which can result in a very low throughput". The latter is kind a true,
but also necessary to avoid congestion and, after all, it may result in
higher throughput compared to the case where ECN is not used and
retransmissions are needed.
I'd rephrase the above to something along the lines:
"When the congestion window of a TCP sender has a
size of one segment and a TCP ACK with an ECN signal (ECE flag) arrives
at the TCP sender, the TCP sender resets the retransmit timer, and
the sender will only be able to send a new packet when the retransmit
timer expires. Effectively, the TCP sender reduces at that
moment its sending rate from 1 segment per Round Trip Time (RTT) to 1
segment per RTO and reduces the sending rate further on each ECN signal
received in subsequent TCP ACKs. Otherwise, if an ECN signal is not
present in a subsequent TCP ACK the TCP sender resumes the normal
ack-clocked transmission of segments [RFC 3168].
Thank you very much for the proposed text. The draft has been updated
accordingly.
It might be also good to rearrange the text in the second and third
paragraphs to disscuss the effect of retrasmission and timeouts more
coherently. I may suggest text for this.
We will welcome any suggestion you may have in this regard.
I'm fine with the text as is. It seems to me after all that rearranging
the text is not crucial. A reader should get the intended message,
although rewrite might help the reader.
Sec 4.2.
"This section discusses TCP stacks that focus on transferring a single
MSS."
Maybe better:
"This section discusses TCP stacks that allow transferring only a single
MSS at a time."
Done.
Sorry, but now I am a bit confused here with the new naming. Earlier this
was titled "single MSS-windows and buffers" and "single-MSS stacks" and
the text indicated that such TCP stack reduces its maximum advertized
window to one MSS and may only hold one MSS of data in its send and
receive buffers. Further, the text indicated and still indicates that
such a stack may, however, have several TCP segments in flight as long as
they carry at most one MSS worth of data in total. So, my comments were
based on that assumption.
Now the new way of naming them as "single-segment stacks" hints that these
are even more restricted such that the stack has e.g., limited capability
for bookkeeping and may only hold a single TCP segment in its one MSS send
or receive buffer at a time regardless of the payload size of the
segment and thereby possibly have only(?) a single data segment in
flight in each direction at any point of time. That is, once such a TCP
sender has received some data from the application, it does not accept
more data from the application until a cumulative Ack has arrived and
released the send buffer for the next piece of app data. And such a TCP
receiver advertises zero window once it receives a data segment carrying
any amount of data (e.g., only a single byte).
Or, there obviously can be different variants of these two behaviors
above, for example uIP that releases the send buffer after transmitting
the segment and requires the app to provide the same data for possible
retransmission. Or, a receiving TCP always delivers the segment
immediately to the application (and the receive buffer is possibly
provided by the application).
So, what is exactly meant by a "single-segment stack"? Both variants
above are possible, or various flavors of them, and possibly
both types of stacks exist at least for the TCP send buffer
implementation? Single-segment reveive buffer implementations that keep
the segment in the TCP reveive buffer until an application receives it
would have hard time to work decently with a regular TCP stack in certain
fully legitimate scenarios, though. Therefore, they possibly are
non-existing?
The definition affects the text in many parts of the draft as well my
comments. For now, I continue with my original interpretation and assume
the definition of "single-MSS buffers" hold. Otherwise, the split hack
would not be possible at all if the TCP sender is not able to compose
more than one segment at a time and would result in really bad
performance with a receiver stack that may accept only a single segment
at time, right?
Sec 4.2.1.
Last sentence:
"For this use of CoAP, a maximum TCP window
of one MSS will be sufficient."
This is not necessarily true. If both TCP stacks involved allow a TCP
window larger than 1 MSS and a CoAP request or response larger than one
MSS is in use, it can be delivered more efficiently than in case where
max TCP window is one MSS.
The above part of the comment is addressed in the new text, thank you.
Furthermore, this may also not be the case if a CoAP over TCP
application uses short-lived TCP connections. Why? Because then the
mandatory CSM message that each CoAP endpoint sends after the 3WHS may
introduce an additional RTT as it cannot necessarily be sent during the
same RTT with the first CoAP request/response. Of course, if the CSM
message and the first CoAP request/response message fit into a single
MSS and the TCP Nagle algorithm is disabled, such a single MSS window
does not result in an additional RTT.
... but this not?
It is true that a CoAP endpoint is not allowed to send a new
application message until a response to the previous one has been
received. However, sending a CMS message is a mandatory artefact
of CoAP over TCP and it is sent as the first msg over the TCP
connection and in both directions. This CMS message eats a part of the one
MSS window. The CoAP over TCP spec allows sending the first application
message immediately after the CMS has been transmitted. However, a TCP
window of one MSS may prevent TCP from transmitting the app msg until
the TCP Ack for the TCP segment carrying CSM has arrived (depends on the
size of these msgs and the MSS). Therefore, the performance may degrade
notably for apps running on a single-segment or single-MSS stack at either
end and using short-lived TCP connections as they may need to first wait
for the TCP Ack for the CSM and only then can the first application
message be transmitted. The need for waiting for the TCP Ack of CSM
arises always if Nagle is enabled, and if Nagle is disabled it may arise
due to the lack of buffer space to hold both CSM and the first data
segment carrying the first application message.
Not quite sure how to address this nicely, since it is a small
CoAP over TCP detail that mostly affects short-lived TCP connections but
depends on the size of the first msgs and MSS (as well as Nagle). That
is, it has an effect under certain conditions, not necessarily always. And
it may have a significant effect or only a negligible effect.
We have updated the sentence accordingly.
Sec. 4.2.2.
"A TCP implementation for a constrained device that uses a single-MSS
TCP receive or transmit window size may not benefit from supporting
the following TCP options: Window scale [RFC7323], TCP Timestamps
[RFC7323], Selective Acknowledgments (SACK) and SACK-Permitted
[RFC2018]."
It may be useful to mention that a TCP sender can benefit from
Timestamps in detecting spurious RTOs that are quite likely to occur
in CNN scenarios.
Done.
Sec. 4.2.3.
2nd para:
"A device that advertises a single-MSS receive window should avoid use
of Delayed ACKs in order to avoid contributing unnecessary delay (of
up to 500 ms) to the RTT [RFC5681], which limits the throughput and
can increase the data delivery time."
This should not appear as a generic recommendation as it is not
correct for some typical usage scenarios such as request-response
traffic where the node with a single-MSS receive window is the server
sending the responses. With delayed ACKs it can biggyback the TCP ACK
with the response if the response is sent before the delayed ACK timer
expires, thus avoiding unnecessary pure TCP ACKs.
So, here, like in Sec 4.3.2., it is important to indicate that it depends
on the communication pattern whether delayed ACKs are useful or harmful.
We have modified this text accordingly.
3rd para:
"A device that can send at most one MSS of data is significantly
affected if the receiver uses Delayed ACKs, e.g., if a TCP server or
receiver is outside the CNN."
Again, this does not hold in all cases. E.g., if the server is outside
of the CNN and request-response communication is used.
We have modified and reorganized the text accordingly.
The "split hack" is not advisable workaround. First for the reason
stated in the end of the para, but more importantly because it simply
does not necessarily even work; a TCP receiver is requited to
acknowledge every second full-sized segment, but not two consecutive
small segments.
Agreed. This comment has been incorporated into the text. We discuss the
?split hack?, but we do not recommend it.
Fine. May I suggest a minor additional tweak:
"A standard compliant TCP receiver will acknowledge the second MSS of
data, ..."
->
"A standard compliant TCP receiver may immediately acknowledge the second
MSS of data, ..."
4th para:
"Similar issues happen when a sender uses the Nagle algorithm.
Disabling the algorithm will not have impact if the sender can only
handle stop-and-wait operation."
Actually it does have an impact in some specific usage scenarios, e.g.,
with CoAP over TCP disabling the Nagle algorithm allows sending the
mandatory CSM message and the first CoAP msg (request) and possibly
also CSM and the first response without unnecessarily waiting for a TCP
ACK of the CSM msg. This is of particular impact if short-lived TCP
connections are in use with CoAP over TCP.
The text above refers to stop-and-wait operation. In your example above,
were you considering stop-and-wait operation?
Yes, I was considering stop-and-wait operation at the application level in
the context of CoAP over TCP which happens to be a weird thing. Even
though a single CoAP request msg may be issued at a time, the
transmission of the CSM message in the beginning of the TCP connection
breaks the stop-and-wait operation.
But, I'm not anymore sure how to interpret the text. Where does "similar
issues" now refer to?
It may be useful to consider moving the discussion on Nagle to Sec 4.2.1
in the context of CoAP over TCP where it has an impact and not discuss it
here at all, because it is not that necessary to say here that "Nagle is
no-op"?
Sec. 4.2.4.
RTO is not estimated but calculated using estimated RTT and deviation
from it. That is, modify:
RTO estimation -> RTO calculation
Done.
2nd para:
"[RFC6298] describes the standard TCP RTO algorithm."
You may delete this sentence and cite RFC 6298 in the first sentence
of the Sec 4.2.4 where the algorithm is first mentioned.
Done.
3rd para:
"As an example, an adaptive RTO algorithm for CoAP over UDP has been
defined [I-D.ietf-core-cocoa] that has been found to perform well in
CNN scenarios [Commag]."
Maybe not a good idea to cite the current version of CoCoA RTO
algorithm (v3) that have been found also to have detrimental behavior?
Done.
Maybe I should not say this but possibly could cite here more than one
alternative that have been experimentally shown to perform well ;)
Sec. 4.3.1.
"Assuming that Delayed ACKs are used by the receiver, the
mentioned algorithms work efficiently for window sizes of at least 5
MSS: If in a given TCP transmission of segments 1, 2, 3, 4, 5, and 6
the segment 2 gets lost, the sender should get an ACK for segment 1
when 3 arrives and duplicate acknowledgements when 4, 5, and 6
arrive. It will retransmit segment 2 when the third duplicate ACK
arrives. In order to have segment 2, 3, 4, 5, and 6 sent, the window
has to be at least 5 MSS. With an MSS of 1220 byte, a buffer of the
size of 5 MSS would require 6100 bytes."
The requirement for the window size to be of at least 5 segments does
not hold if Limited Transmit is in use.
We have updated the text (including that now we mention ?up to 5 MSS?),
along with updates in the Limited Transmit paragraph.
Also, the requirement of at least 5 segments is valid only if the ACK
for the segment 1 was held by the DelAck timer, i.e., the requirement
holds approx. with 50% probability. That is, if the segment 1 got
acknowledged (because there was also a segment before segment 1 and
that was held by DelAck timer), only a window size of 4 MSS is needed.
We have modified the text accordingly.
I have similar comment as Ilpo here. It's worth clarifying the example
by removing the potential interpretation that segments 1-6 can be in
flight as a starting point with a cwnd of 5 MSS (assuming they all are
full-sized segments).
Note also that in some traffic scenarios where Nagle is disabled and a TCP
sender does not send MSS-sized segments but smaller segments, it is quite
possible to levarage FR/FR even with window sizes smaller than 5 MSS
(actually even with a window size of one MSS 3 dupacks and FR/FR may be
possible).
"For bulk data transfers further TCP improvements may also be useful,
such as limited transmit [RFC3042].
Limited Transmit is not useful only for bulk data transfers but for any
transfer that has more than one segment in flight. Small transfers tend
to benefit more, because they are more likely to not receive enough
dupacks.
We have modified the paragraph accordingly.
Fine, but it might be useful to start the 2nd para that discusses
Limited Transmit by noting that the example in the 1st para assumed
that limited Transmit is not in use.
In addition, actually a cwnd allowing 2 segments in flight would be
enough to trigger sending segments 1-5. Only difference to cwnd of 3
segments is that one needs to wait for the DelAck timer to expire for
segment 1.
Sec. 4.3.1.1.
"... a sender (having previously sent the SACK-Permitted
option) can avoid performing unnecessary retransmissions, saving
energy and bandwidth, as well as reducing latency."
It might be worth mentioning also that SACK often allows for faster
loss recovery when there is more than one lost segment in a window of
data (i.e., recovery with less RTTs).
Done.
Fine. I'd suggest editing the sentence slightly because it is not
guaranteed that with SACK recovery takes less RTTs:
"...since with SACK recovery requires less RTTs."
-->
"...since with SACK recovery may complete with less RTTs.
Sec. 4.3.2.
Disabling delayed ACKs on a client for infrequent request-response
traffic with small messages might be advisable, too. It would allow
an immediate ACK for the data segment carrying the response.
This comment holds for Sec. 4.2.3 as well.
Added, thanks.
The text here would benefit from more accurate expression of at which end
the delayed Acks should be turned on and where turned off (e.g., "sender"
does not specify the end point as there is both app. level (data) sender
as well as TCP sender at each end for request-response type of traffic.
Maybe something like:
For request-response traffic, enabling Delayed ACKs is recommended,
in order to allow combining a response with the ACK into
a single segment, thus increasing efficiency. In this case,
disabling Delayed ACKs at the sender allows an immediate
ACK for the data segment carrying the response.
-->
For request-response traffic, enabling Delayed ACKs is recommended at
the server end, in order to allow combining a response with the ACK into
a single segment, thus increasing efficiency. In addition, if
a client issues requests infrequently, disabling Delayed
ACKs at the client allows an immediate ACK for the data segment
carrying the response.
Sec. 5.3.
"A mean TCP NAT binding timeout of 386
minutes has been reported, while in some cases, inactivity timeouts
are in the order of a few minutes [HomeGateway].
Reporting just the mean TCP NAT binding timeout from [HomeGateway] does
not give a correct view of the results in this study, because the
meaasured timeouts were highly variable and some devices had a very long
timeout (or no timeout at all), yielding a very high mean timeout value.
Therefore, we reported median and it would be more descriptive to report
it here as well. The median of the measured TCP NAT binding timeouts in
this study was around 60 mins, the shortest being around 2 mins. That is,
clearly more than 50% of the devices had timeout shorter than RFC
5382 recommended minimum of 124 mins.
In the light of these results, it may be hard to find a proper timeout
value for the application-layer heartbeat messages, and it might be
worth mentioning, I think.
We have updated the section based on these comments.
Nits:
All done, except for the ?Bit-Error Rate? suggestion. The Collins English
dictionary uses the non-hyphenated form.
Sec 1:
1st para: Add references and cite 6LoWPAN, RPL, and CoAP.
3rd para: "At the application layer, CoAP was developed over UDP
[RFC7252]."
- this seems to cite UDP incorrectly while the intent is to cite CoAP.
If you cite CoAP in the first para, you do not need to cite here at
all.
"This the main reason..." -> "This is the main reason..."
5th para:
"Given the limited resources on constrained devices, careful "tuning"
of the TCP implementation can make an implementation more lightweight."
Instead of saying "tuning" of the TCP implementation, I'd say that careful
selection of optional TCP features can make an implementation more
lightweight (and improve operation in CNNs).
6th para:
"This document provides guidance on how to implement and use TCP in
CNNs.
->
"This document provides guidance on how to implement and configure TCP
as well as how TCP is advisable to be used by applications in CNNs.
Sec 3.1., last para:
high bit error rate -> high bit-error rate
Sec 5., first sentence:
"how a TCP stack can be used" -> "how TCP can be used"
Once again, thank you very much for your comprehensive review and
constructive suggestions!
Cheers,
Carles (on behalf of all authors)
_______________________________________________
Lwip mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lwip