Note: I'm forwarding this message with PNG attachments removed,
as I got politely and deservedly reminded that big attachments are a 
no-no in a mailing list. Here goes the message:

> > The "correction" field inserted by the RuggedCom switch contains
> > values between 10 and 20 million raw units, that's some 150 to 300ns.
> > Sounds about appropriate. Makes me wonder if the contents of the PTP
> > traffic can make the Intel hardware puke :-/ The actual jitter, or
> > the non-zero correction field... it's strange.
> > 
Actually... this is probably wrong. The value in the correction.ns 
field is about 10 to 20 million, i.e. 10 to 20 milliseconds. I can 
see the raw value in the frame (in hex) and that's what Wireshark and 
ptpTrackHound interpret, in unison. 
And, one vendor techsupport insists that a correction value of 20 ms
is perfectly allright in a TC switch, due to SW processing of the PTP 
packets. Yikes, what?
Or, is there any chance that my sniffing rig is broken? 

I've captured the PTP traffic by libpcap, 
A) either with ptp4l running in software mode as a client 
    to a TC switch (with a Meinberg GM as the next upstream hop) 
B) or as a pure sniffer, listening to traffic between a 3rd-party
   client and the TC. The Intel NIC does have PTP support, but I 
   understand that it is turned off, at the time of the capture. 

Any chance that the Intel NIC hardware would mangle the correction 
field? (I hope not - after some debate in another thread, the 10-20ms 
really seem allright, even if spooky.)

I'll probably have to borrow a proper "meter" device anyway :-/

I have some other potentially interesting observations, relevant to 
ptp4l and Intel HW.

There are two GM's in play: 

GM A (older), which correlated with a problem reported on site by a 
particular 3rd-party PTP slave. Presumed buggy.

GM B (younger), whose deployment correlated with the 3rd-party slave 
becoming happy. Presumed healthy.

The 3rd-party slave is a black box, expensive, presumably 
high-quality implementation.
Let me focus on the behavior observed in ptp4l with HW accel.


I actually tried ptp4l with HW support under several slightly 
different scenaria. L2 Multicast and 2-step P2P mechanism were 
common, but details were different. 

1) with "grandmaster B", directly attached at 1 Gbps, configured for 
C37.238-2017 (including ALTERNATE_TIME_OFFSET_INDICATOR_TLV),
both ends without a VLAN tag, in my lab. That worked for the most 
part, ptp4l would throw maybe 8 TX timeouts during one night (10 
hours).

2) with "grandmaster B", on site, configured for C37.238-2017 
(including ALTERNATE_TIME_OFFSET_INDICATOR_TLV),
both ends without a VLAN tag, through a PTP-capable switch
(the one adding 10-20 ms of "correction").
Here the ptp4l with HW accel would never stop choking with TX 
timeouts. Sometimes it went for 3 to 10 PDelay transactions without a
timeout, sometimes it would run timeout after timeout.
There was 3rd-party multicast traffic on the network (IEC61850 
GOOSE).

3) with "grandmaster A", on site, direct attached, configured for 
C37.238-2011 (no local timezone TLV), but *with* a VLAN tag 
containing ID=0 configured on the GM, and *without* VLAN tag on the 
ptp4l client, the ptp4l would not sychronize to the GM. In the packet
trace I can see all the messages from the GM, and ptp4l does respond 
to the master's PDelay Requests, but the GM does *not* respond to 
ptp4l's PDelay Requests.
=> I consider this a misconfiguration on my part (PEBKAC),
even though... theoretically... VLAN ID=0 means "this packet has 
802.1p priority assigned, but does not belong to a VLAN".
The GM *could* be a little more tolerant / liberal in what it accepts
:-) Then again, I do not know the wording of the 2011 "power 
profile".

4) with "grandmaster A", direct attached, back home in the lab, 
configured for C37.238-2011 (no local timezone TLV), but *with* a 
VLAN tag containing ID=0 configured on the GM, and *with* a VLAN tag 
ID=0 on the ptp4l client (created a VLAN subinterface eth0.0), 
ptp4l now RUNS LIKE A CHEETAH FOR DAYS !
No TX timeouts in the log.

=> the Intel NIC hardware is possibly sensitive to "irrelevant" 
contents in the traffic. I can come up with the following candidate 
culprits/theories: 
- absence of the VLAN tag
- correction values of 10-20 ms
- other mcast traffic interfering
- higher/different actual jitter in the messages?

> Which device (and driver) are you using? (I can't see it in the history).
> 
On the ptp4l client?
The PC is a pre-production engineering sample panel PC by Arbor/TW, 
with Intel Skylake mobile, the NIC that I'm using is an i219LM 
integrated on the mothereboard (not sure if this has a MAC on chip 
within the PCH/south, or if it's a stand-alone NIC). Of the two Intel
NIC chips, this one is more precise. The kernel is a fresh vanilla 
4.13.12 and the e1000e driver came with it.
I'm attaching a dump of dmesg and lspci. Ask for more if you want.

Frank Rysanek

Attachment: WPM$LMWC.PM$
Description: Mail message body

The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

   ---- File information -----------
     File:  arbor-dmesg.txt
     Date:  6 Dec 2017, 17:55
     Size:  56955 bytes.
     Type:  Text

Attachment: arbor-dmesg.txt
Description: Binary data

The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

   ---- File information -----------
     File:  arbor-lspci.txt
     Date:  6 Dec 2017, 17:55
     Size:  1007 bytes.
     Type:  Text

Attachment: arbor-lspci.txt
Description: Binary data

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to