Jouni,
Here is the thread for your second major comment:
Major comment #2 from you:
>2) The PDM option relation to actual "server" time is somewhat confusing and
>the 5-tuple does not allow me to detect the real relationship between the
>>server/application action that caused the generation of the packet and the
>PDM within the packet. This is specifically an issue with
>transport/application protocols >that multiplex/interleave multiple
>application streams into one transport. I have no idea of the actual
>individual application time since the packets get generated >independent of
>the processing of a single thread. I would welcome some discussion around
>here. Section 1.4 last paragraph is going to this direction but is not
>>sufficient IMHO.
Yes, you are, of course, correct that all traffic will flow between the
matching ports at the two endpoints. The 5-tuples will match regardless of the
application.
The thing is that we never intended that PDM would distinguish between
applications using the same 5-tuple. That is, it is a feature, not a bug.
What PDM WILL tell you is whether the problem is in the network or the host.
In our experience, which is primarily on networks for large data centers, there
is a different group which is involved to troubleshoot the problem depending on
the nature of the problem. That is, do I get the application developers on the
line or the team that deals with the routers & infrastructure.
One of the important functions of PDM is to allow you to do quick triage so
that you can get the right SWAT team going. PDM does not tell you if the
problem is in the IP stack or the application or buffer allocation. PDM also
does not tell you which of the network segments or middle boxes is at fault.
The reason for PDM is to get the right specialists in place who can then be
dispatched to investigate their area.
In our experience, valuable time is often lost at this first stage of triage.
Both the network group and the application group have quite a few specialized
tools at their disposal to further investigate their own areas.
I am adding some of this verbiage to section 1.4. Please see below:
CURRENT
-----------
1.4 Rationale for defined solution
The current IPv6 specification does not provide timing nor a similar
field in the IPv6 main header or in any extension header. So, we
define the IPv6 Performance and Diagnostic Metrics destination option
(PDM).
Advantages include:
1. Real measure of actual transactions.
2. Independence from transport layer protocols.
3. Ability to span organizational boundaries with consistent
instrumentation
4. No time synchronization needed between session partners
5. Ability to handle all transport protocols (TCP, UDP, SCTP, etc)
in a uniform way
The PDM provides the ability to determine quickly if the (latency)
problem is in the network or in the server (application). More
intermediate measurements may be needed if the host or network
discrimination is not sufficient. At the client, TCP/IP stack time
vs. application time may still need to be broken out by client
software.
NEW----
1.4 Rationale for defined solution
The current IPv6 specification does not provide timing nor a similar
field in the IPv6 main header or in any extension header. So, we
define the IPv6 Performance and Diagnostic Metrics destination option
(PDM).
Advantages include:
1. Real measure of actual transactions.
2. Independence from transport layer protocols.
3. Ability to span organizational boundaries with consistent
instrumentation
4. No time synchronization needed between session partners
5. Ability to handle all transport protocols (TCP, UDP, SCTP, etc)
in a uniform way
The PDM provides the ability to determine quickly if the (latency)
problem is in the network or in the server (application). That is,it is a fast
way to do triage.
One of the important functions of PDM is to allow you to do quickly dispatchthe
right set of diagnosticians. Within network or server latency,there may be
many components. The job of the diagnostician is to ruleeach one out until the
culprit is found.
How PDM fits into this diagnostic picture is that PDM will quickly tell you how
to escalate. PDM will point to either the network area or theserver area.
Within the server latency, PDM does not tell you if the bottleneckis in the IP
stack or the application or buffer allocation. Within the network latency, PDM
does not tell you which of the network segments or middle boxes is at fault.
What PDM will tell you is whether the problem is in the network or the server.
In our experience, there is often a different group which is involved to
troubleshoot the problem depending on the nature of the problem. That is, the
problem may be escalated to the application developersor the team that deals
with the routers and infrastructure. Both the network group and the
application group have quite a few specialized tools at their disposal to
further investigate theirown areas. What is missing is the first step, which
PDM provides.
In our experience, valuable time is often lost at this first stage of triage.
PDM is expected toreduce this time substantially.
Thanks,
Nalini Elkins
Inside Products, Inc.
www.insidethestack.com
(831) 659-8360
________________________________
From: jouni korhonen <jouni.nos...@gmail.com>
To: General Area Review Team <gen-art@ietf.org>;
draft-ietf-ippm-6man-pdm-option....@ietf.org
Sent: Friday, September 23, 2016 11:14 AM
Subject: Gen-ART review of
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please wait for direction from your
document shepherd or AD before posting a new version of the draft.
For more information, please see the FAQ at
<http://wiki.tools.ietf.org/ar ea/gen/trac/wiki/GenArtfaq>.
Document: draft-ietf-ippm-6man-pdm-option-05
Reviewer: Jouni Korhonen
Review Date: 9/23/2016
IETF LC End Date: 2016-09-28
IESG Telechat date: (if known)
Summary: The draft needs some work.
Major issues:
I have two technical issues here:
1) There is no mention of what is the time reference plane for internal time
stamping. All other timing and synchronization related documents I am aware of
(at least outside IETF) describe it very clearly where in the processing/packet
handling the time stamp is to be taken. Now the document gives me no idea as an
implementer where that should take place. At least it makes it hard to
calculate the *network* RTT precisely.
2) The PDM option relation to actual "server" time is somewhat confusing and
the 5-tuple does not allow me to detect the real relationship between the
server/application action that caused the generation of the packet and the PDM
within the packet. This is specifically an issue with transport/application
protocols that multiplex/interleave multiple application streams into one
transport. I have no idea of the actual individual application time since the
packets get generated independent of the processing of a single thread. I would
welcome some discussion around here. Section 1.4 last paragraph is going to
this direction but is not sufficient IMHO.
Minor issues:
1) This is a larger editorial issue. The document is far too long with a lot of
repetition considering it describes only one IPv6 destination option. It is a
writing style issue and I am fully aware of that. I have proposals how to cut
text in the editorial comments section.
2) Section 1.2 3rd paragraph talks about IoT and that speed matters there. I
find this too generalized statement. There are many other things that matter in
this application domain and speed might not be that important as being able to
send/receive that one to two bytes of data in a given time window. I suggest
removing this paragraph.
Nits/editorial comments:
1) Section 1.4 numbered list: add missing full stops.
2) Section 3.2: remove
"The 5-tuple consists of
the source and destination IP addresses, the source and destination
ports, and the upper layer protocol (ex. TCP, ICMP, etc)."
since this is unnecessary repetition.
3) Section 3.2: remove
"Operating systems MUST NOT implement a single
counter for all connections."
Seems again like unnecessary repetition to previous sentence.
4) Section 3.2 again unnecessary repetition of IPv6 basics that can be read
from RFC2460. Suggest strongly to remove:
"This indicates the
following processing requirements:
00 - skip over this option and continue processing the header.
RFC2460 [RFC2460] defines other values for the Option Type field.
These MUST NOT be used in the PDM."
and
"The
possible values are as follows:
0 - Option Data does not change en-route
1 - Option Data may change en-route
The three high-order bits described above are to be treated as part
of the Option Type, not independent of the Option Type. That is, a
particular option is identified by a full 8-bit Option Type, not just
the low-order 5 bits of an Option Type."
5) Section 3.3 same as in comment 4). Suggest strongly removing:
"This follows the order defined in RFC2460 [RFC2460]
IPv6 header
Hop-by-Hop Options header
Destination Options header <--------
Routing header
Fragment header
Authentication header
Encapsulating Security Payload header
Destination Options header <------------
upper-layer header"
6) Suggest removing entire Section 3.4 and moving the following text to Section
3.3:
"PDM MUST be placed before the ESP header in
order to work. If placed before the ESP header, the PDM header will
flow in the clear over the network thus allowing gathering of
performance and diagnostic data without sacrificing security."
7) Section 3.6 suggest removing the following text. I see no value it would add
to what has already been said:
"As with all other destination options extension headers, the PDM is
for destination nodes only. As specified above, intermediate devices
MUST neither set nor modify this field."
8) Section 3.6 suggest removing the following 5-tuple text as it has already
been described earlier in Section 2:
"The 5-tuple is:
SADDR : IP address of the sender SPORT : Port for sender DADDR : IP
address of the destination DPORT : Port for destination PROTC :
Protocol for upper layer (ex. TCP, UDP, ICMP)"
9) Sections 4.2 and 4.3 suggest removing them entirely. I see what value these
sections add. I acknowledge they are good to know information of timer hardware
implementation difference but do not really add value on the on-wire encoding
of the PDM option.
10) Section 4.4 suggest removing the entire section. Time Base was already
described in detail enough in Section 3.2.
11) Section 4.5 time base for picoseconds is 11 not 00.
12) Section 4.5 suggest removing the following text, since it does not add any
more clarity to what has already been said in my opinion. This is because all
the examples follow nice nybble increment in scaling:
"Sample binary values (high order 16 bits taken)
1 psec 1 0001
1 nsec 3E8 0011 1110 1000
1 usec F4240 1111 0100 0010 0100 0000
1 msec 3B9ACA00 0011 1011 1001 1010 1100 1010 0000 0000
1 sec E8D4A51000 1110 1000 1101 0100 1010 0101 0001 0000 0000 0000"
12) Section 4.6 I do not understand why this section is here. I strongly
suggest removing it. Sections 4.5 and 3.2 already describe how I would encode
the delta time using scaling as a separate fields not embedded (option fields
ScaleDTLR and ScaleDTLS). Did I misunderstand something here?
13) Section 5 suggest removing the following text because of it repeating what
has already been said earlier:
"Each packet, in addition to the PDM contains information on the
sender and receiver. As discussed before, a 5-tuple consists of:
SADDR : IP address of the sender
SPORT : Port for sender
DADDR : IP address of the destination
DPORT : Port for destination
PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP)
It should be understood that the packet identification information is
in each packet. We will not repeat that in each of the following
steps."
14) Section 5.3 suggest merging the following text into one example and do
necessary rewording. There is no need to do the same calculation twice on
almost adjacent lines:
"Sending time : packet 2 - receive time : packet 1
We will call the result of this calculation: Delta Time Last Received
(DELTATLR)
That is:
Delta Time Last Received = (Sending time: packet 2 - receive time:
packet 1)"
15) Expand RTT and PSN on their first use.
Phew.. after all this I found the document good reading and most likely a
useful tool to be used.
Regards,
Jouni
_______________________________________________
Gen-art mailing list
Gen-art@ietf.org
https://www.ietf.org/mailman/listinfo/gen-art