Re: [Gen-art] Gen-ART review of draft-ietf-ippm-6man-pdm-option: Server Time

nalini.elkins Tue, 21 Feb 2017 10:00:17 -0800

Jouni,

Here is the thread for your second major comment:


Major comment #2 from you:

>2) The PDM option relation to actual "server" time is somewhat confusing and 
>the 5-tuple does not allow me to detect the real relationship between the 
>>server/application action that caused the generation of the packet and the 
>PDM within the packet. This is specifically an issue with 
>transport/application protocols >that multiplex/interleave multiple 
>application streams into one transport. I have no idea of the actual 
>individual application time since the packets get generated >independent of 
>the processing of a single thread. I would welcome some discussion around 
>here. Section 1.4 last paragraph is going to this direction but is not 
>>sufficient IMHO.

Yes, you are, of course, correct that all traffic will flow between the 
matching ports at the two endpoints. The 5-tuples will match regardless of the 
application.

The thing is that we never intended that PDM would distinguish between 
applications using the same 5-tuple.  That is, it is a feature, not a bug.

What PDM WILL tell you is whether the problem is in the network or the host.    
In our experience, which is primarily on networks for large data centers, there 
is a different group which is involved to troubleshoot the problem depending on 
the nature of the problem.  That is, do I get the application developers on the 
line or the team that deals with the routers & infrastructure.

One of the important functions of PDM is to allow you to do quick triage so 
that you can get the right SWAT team going.   PDM does not tell you if the 
problem is in the IP stack or the application or buffer allocation.   PDM also 
does not tell you which of the network segments or middle boxes is at fault.  
The reason for PDM is to get the right specialists in place who can then be 
dispatched to investigate their area.

In our experience, valuable time is often lost at this first stage of triage.   
Both the network group and the application group have quite a few specialized 
tools at their disposal to further investigate their own areas.

I am adding some of this verbiage to section 1.4.   Please see below:

CURRENT
-----------

1.4 Rationale for defined solution 

The current IPv6 specification does not provide timing nor a similar 
field in the IPv6 main header or in any extension header. So, we 
define the IPv6 Performance and Diagnostic Metrics destination option 
(PDM). 

Advantages include: 

1. Real measure of actual transactions. 
2. Independence from transport layer protocols. 
3. Ability to span organizational boundaries with consistent 
instrumentation 
4. No time synchronization needed between session partners 
5. Ability to handle all transport protocols (TCP, UDP, SCTP, etc) 
in a uniform way 

The PDM provides the ability to determine quickly if the (latency) 
problem is in the network or in the server (application). More 
intermediate measurements may be needed if the host or network 
discrimination is not sufficient. At the client, TCP/IP stack time 
vs. application time may still need to be broken out by client 
software. 


NEW----
1.4 Rationale for defined solution 

The current IPv6 specification does not provide timing nor a similar 
field in the IPv6 main header or in any extension header. So, we 
define the IPv6 Performance and Diagnostic Metrics destination option 
(PDM). 

Advantages include: 

1. Real measure of actual transactions. 
2. Independence from transport layer protocols. 
3. Ability to span organizational boundaries with consistent 
instrumentation 
4. No time synchronization needed between session partners 
5. Ability to handle all transport protocols (TCP, UDP, SCTP, etc) 
in a uniform way 

The PDM provides the ability to determine quickly if the (latency) 
problem is in the network or in the server (application).  That is,it is a fast 
way to do triage.
One of the important functions of PDM is to allow you to do quickly dispatchthe 
right set of diagnosticians.  Within network or server latency,there may be 
many components.  The job of the diagnostician is to ruleeach one out until the 
culprit is found.
How PDM fits into this diagnostic picture is that PDM will quickly tell you how 
to escalate.  PDM will point to either the network area or theserver area.   
Within the server latency, PDM does not tell you if the bottleneckis in the IP 
stack or the application or buffer allocation. Within the network latency, PDM 
does not tell you which of the network segments or middle boxes is at fault. 
What PDM will tell you is whether the problem is in the network or the server. 
In our experience, there is often a different group which is involved to 
troubleshoot the problem depending on the nature of the problem.   That is, the 
problem may be escalated to the application developersor the team that deals 
with the routers and infrastructure.  Both the network group and the 
application group have quite a few specialized tools at their disposal to 
further investigate theirown areas.   What is missing is the first step, which 
PDM provides.
In our experience, valuable time is often lost at this first stage of triage.  
PDM is expected toreduce this time substantially.  
Thanks,

Nalini Elkins
Inside Products, Inc.
www.insidethestack.com
(831) 659-8360

________________________________

From: jouni korhonen <[email protected]>
To: General Area Review Team <[email protected]>; 
[email protected] 
Sent: Friday, September 23, 2016 11:14 AM
Subject: Gen-ART review of



I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please wait for direction from your
document shepherd or AD before posting a new version of the draft.

For more information, please see the FAQ at

<http://wiki.tools.ietf.org/ar ea/gen/trac/wiki/GenArtfaq>.

Document: draft-ietf-ippm-6man-pdm-option-05
Reviewer: Jouni Korhonen
Review Date: 9/23/2016
IETF LC End Date: 2016-09-28
IESG Telechat date: (if known)

Summary: The draft needs some work.

Major issues:


I have two technical issues here:


1) There is no mention of what is the time reference plane for internal time 
stamping. All other timing and synchronization related documents I am aware of 
(at least outside IETF) describe it very clearly where in the processing/packet 
handling the time stamp is to be taken. Now the document gives me no idea as an 
implementer where that should take place. At least it makes it hard to 
calculate the *network* RTT precisely.


2) The PDM option relation to actual "server" time is somewhat confusing and 
the 5-tuple does not allow me to detect the real relationship between the 
server/application action that caused the generation of the packet and the PDM 
within the packet. This is specifically an issue with transport/application 
protocols that multiplex/interleave multiple application streams into one 
transport. I have no idea of the actual individual application time since the 
packets get generated independent of the processing of a single thread. I would 
welcome some discussion around here. Section 1.4 last paragraph is going to 
this direction but is not sufficient IMHO.


Minor issues:


1) This is a larger editorial issue. The document is far too long with a lot of 
repetition considering it describes only one IPv6 destination option. It is a 
writing style issue and I am fully aware of that. I have proposals how to cut 
text in the editorial comments section.


2) Section 1.2 3rd paragraph talks about IoT and that speed matters there. I 
find this too generalized statement. There are many other things that matter in 
this application domain and speed might not be that important as being able to 
send/receive that one to two bytes of data in a given time window. I suggest 
removing this paragraph.


Nits/editorial comments:


1) Section 1.4 numbered list: add missing full stops.


2) Section 3.2: remove
  "The 5-tuple consists of
   the source and destination IP addresses, the source and destination
   ports, and the upper layer protocol (ex. TCP, ICMP, etc)."

since this is unnecessary repetition.

3) Section 3.2: remove
  "Operating systems MUST NOT implement a single
   counter for all connections."

Seems again like unnecessary repetition to previous sentence.


4) Section 3.2 again unnecessary repetition of IPv6 basics that can be read 
from RFC2460. Suggest strongly to remove:
  "This indicates the
   following processing requirements:

   00 - skip over this option and continue processing the header.

   RFC2460 [RFC2460] defines other values for the Option Type field.
   These MUST NOT be used in the PDM."


and

  "The
   possible values are as follows:

   0 - Option Data does not change en-route

   1 - Option Data may change en-route

   The three high-order bits described above are to be treated as part
   of the Option Type, not independent of the Option Type.  That is, a
   particular option is identified by a full 8-bit Option Type, not just
   the low-order 5 bits of an Option Type."


5) Section 3.3 same as in comment 4). Suggest strongly removing:
  "This follows the order defined in RFC2460 [RFC2460]
                 IPv6 header

                 Hop-by-Hop Options header

                 Destination Options header  <--------

                 Routing header

                 Fragment header

                 Authentication header

                 Encapsulating Security Payload header

                 Destination Options header <------------

                 upper-layer header"


6) Suggest removing entire Section 3.4 and moving the following text to Section 
3.3:
  "PDM MUST be placed before the ESP header in
   order to work.  If placed before the ESP header, the PDM header will
   flow in the clear over the network thus allowing gathering of
   performance and diagnostic data without sacrificing security."


7) Section 3.6 suggest removing the following text. I see no value it would add 
to what has already been said:
  "As with all other destination options extension headers, the PDM is
   for destination nodes only. As specified above, intermediate devices
   MUST neither set nor modify this field."


8) Section 3.6 suggest removing the following 5-tuple text as it has already 
been described earlier in Section 2:
  "The 5-tuple is:

   SADDR : IP address of the sender SPORT : Port for sender DADDR : IP
   address of the destination DPORT : Port for destination PROTC :
   Protocol for upper layer (ex. TCP, UDP, ICMP)"


9) Sections 4.2 and 4.3 suggest removing them entirely. I see what value these 
sections add. I acknowledge they are good to know information of timer hardware 
implementation difference but do not really add value on the on-wire encoding 
of the PDM option.


10) Section 4.4 suggest removing the entire section. Time Base was already 
described in detail enough in Section 3.2.


11) Section 4.5 time base for picoseconds is 11 not 00.


12) Section 4.5 suggest removing the following text, since it does not add any 
more clarity to what has already been said in my opinion. This is because all 
the examples follow nice nybble increment in scaling:
  "Sample binary values (high order 16 bits taken)

   1 psec            1                                              0001
   1 nsec          3E8                                    0011 1110 1000
   1 usec        F4240                          1111 0100 0010 0100 0000
   1 msec     3B9ACA00           0011 1011 1001 1010 1100 1010 0000 0000
   1 sec    E8D4A51000 1110 1000 1101 0100 1010 0101 0001 0000 0000 0000"


12) Section 4.6 I do not understand why this section is here. I strongly 
suggest removing it. Sections 4.5 and 3.2 already describe how I would encode 
the delta time using scaling as a separate fields not embedded (option fields 
ScaleDTLR and ScaleDTLS). Did I misunderstand something here?


13) Section 5 suggest removing the following text because of it repeating what 
has already been said earlier:
  "Each packet, in addition to the PDM contains information on the
   sender and receiver. As discussed before, a 5-tuple consists of:

      SADDR : IP address of the sender
      SPORT : Port for sender
      DADDR : IP address of the destination
      DPORT : Port for destination
      PROTC : Protocol for upper layer (ex. TCP, UDP, ICMP)

   It should be understood that the packet identification information is
   in each packet. We will not repeat that in each of the following
   steps."


14) Section 5.3 suggest merging the following text into one example and do 
necessary rewording. There is no need to do the same calculation twice on 
almost adjacent lines:

  "Sending time : packet 2 - receive time : packet 1

   We will call the result of this calculation: Delta Time Last Received
   (DELTATLR)

   That is:

   Delta Time Last Received = (Sending time: packet 2 - receive time:
   packet 1)"


15) Expand RTT and PSN on their first use.


Phew.. after all this I found the document good reading and most likely a 
useful tool to be used.


Regards,

   Jouni

_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Re: [Gen-art] Gen-ART review of draft-ietf-ippm-6man-pdm-option: Server Time

Reply via email to