Hey all, I just wanted to wrap up this thread in case anyone else comes across 
this issue. It turns out the source of my problems was the clock I was using. 
The VersaSync wasn't sending delay-response messages which was why the slave 
systems would never actually sync to VersaSync. This was due to having both 
ethernet ports of the VersaSync in the same network. The VersaSync's default 
ethernet port is eth0 (and can't be changed right now due to a bug) but PTP is 
only available on eth1. Disconnecting eth0 or setting its IP address outside 
the range of eth1's network is currently the best workaround until the option 
to change the default ethernet port is fixed in the VersaSync's web gui.


Thank you Richard and David for your help!


Adam


________________________________
From: Essling, Adam M <adam.essl...@udri.udayton.edu>
Sent: Friday, September 20, 2019 12:20 PM
To: David Mirabito
Cc: linuxptp-users@lists.sourceforge.net
Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems

Another brief update here,

I'm still waiting to hear back from the clock manufacturer. In the meantime, I 
did update the clock's on-board software from 1.3.1d to 1.3.1k, the latest 
release. Unfortunately the slave systems still do not actually enter slave 
mode, the same as before.

I did think of another question. Do I need to run an instance of ptp4l on the 
actual VersaSync itself in order for the slave systems to communicate properly 
with it? I don't know if I should or even can do this, but I thought I should 
at least ask. The only configuration I've done on the VersaSync has been via 
the web UI.

Thanks,
Adam
________________________________
From: Essling, Adam M [adam.essl...@udri.udayton.edu]
Sent: Wednesday, September 11, 2019 2:49 PM
To: David Mirabito
Cc: linuxptp-users@lists.sourceforge.net
Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems

Hey David,

Sorry for the delay in responding - I was traveling. I have a few updates!

1. The slave systems never enter SLAVE. It appears as though they are stuck in 
UNCALIBRATED, even if I let it sit for a long time. The only thing that happens 
is occasionally the "selected best master clock" message will be displayed 
(still with the correct MAC of the Versasync at least).

2. I completely flushed iptables and tried to run ptp4l but still was getting 
the same results as in my previous post.

3. I ran TCPDUMP while ptp4l was running and this time I did notice some udp 
messages with 0 length, so I added in the rule you suggested. That stopped the 
"port 1: bad message" output, but the slave still seems to be stuck in 
UNCALIBRATED.

4. I was able to remove the Versasync from the network and set [system2] as a 
master clock with [system1] as a slave to [system2]. This worked perfectly, 
which leads me to believe the issue here is the Versasync Clock. I'm going to 
look into possible firmware updates or some support from the clock manufacturer.

Adam
________________________________
From: David Mirabito [davi...@arista.com]
Sent: Wednesday, September 04, 2019 4:47 PM
To: Essling, Adam M
Cc: Richard Cochran; linuxptp-users@lists.sourceforge.net
Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems

Hey Adam,

That's interesting, if the patch turned "FAULTY" into "bad message" logs then 
it feels somewhat related. The change should only affect packets where recvmsg 
successfully returns zero, as you see.
After entering UNCALIBRATED did it eventually SLAVE? (despite the bad messages 
being sent each second - annoying for log noise but at least no longer 
resetting ptp4l's state machine, you'd see the same if something was spamming 1 
byte payloads that still aren't a valid PTP header ...).

Did you TCPDUMP whilst ptp4l was running? It's possible the master doesn't send 
(malformed) status requests to non-active clients, or could be something to do 
with multicast subscriptions perhaps. I'd expect the undersized packet to come 
amid a flurry of other management GET messages.

Alternatively one can drop such undersized packets before it gets anywhere near 
ptp4l. I'm not super knowledgable on firewalls, but this seems to do the trick, 
too - ether preventing faults pre-patch, or stopping noisy 'bad message' logs 
post-patch

iptables -A INPUT -p udp -m udp --dport 320 -m length --length 28 -j DROP

I have seen this in the wild a couple of times, with PTP master appliance 
devices. It's not clear why tcpdump test was negative, but the appearance of 
bad messages still sounds promising.

Cheers,
David

PS: it may also be worth looking for a master firmware update. I've heard 
rumours one may have been in the pipeline earlier this year, but no 
confirmation if that occurred and/or if it addressed this specific issue. We 
carry the linked patch to ptp4l just to be sure.

On Thu, 5 Sep 2019 at 01:26, Essling, Adam M 
<adam.essl...@udri.udayton.edu<mailto:adam.essl...@udri.udayton.edu>> wrote:
Thank you both for responding so quickly!

David,
I tried the patch you recommended and but ended up getting this result:
ptp4l[4756130.278]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[4756130.278]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[4756130.278]: port 1: link up
ptp4l[4756130.952]: port 1: bad message
ptp4l[4756131.951]: port 1: new foreign master 000cec.fffe.0d013a-1
ptp4l[4756131.952]: port 1: bad message
ptp4l[4756132.952]: port 1: bad message
ptp4l[4756133.952]: port 1: bad message
ptp4l[4756134.952]: port 1: bad message
ptp4l[4756135.951]: selected best master clock 000cec.fffe.0d013a
ptp4l[4756135.951]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[4756135.952]: port 1: bad message

Followed by continuous bad messages after that. I checked the tcpdump and none 
of the UDP messages I saw had a zero payload, so I'm not sure what the bad 
messages are.

I also tried running ptp4l v2.0 and I got a slightly different error message:
ptp4l: [8549.759] selected /dev/ptp0 as PTP clock
ptp4l: [8549.761] port 1: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l: [8549.761] port 0: INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l: [8550.221] port 1: new foreign master 000cec.fffe.0d013a-1
ptp4l: [8550.221] recvmsg failed: No such device or address
ptp4l: [8550.222] port 1: recv message failed
ptp4l: [8550.222] port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
...

I'm not sure if it's relevant but I have been able to use ptpd2 with both slave 
systems and the Versasync clock with the same network setup.

Richard,
Here's the info you requested:

[system1] uname -r
4.4.38-tegra

[system1] ethtool -i
driver: eqos
version:
firmware-version:
expansion-rom-version:
bus-info: 2490000.ether_qos
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

[system1] iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER-USER  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate 
RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (1 references)
target     prot opt source               destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-USER (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere


[system2] uname -r
4.15.0-47-generic

[system2] ethtool -i
driver: igb
version: 5.4.0-k
firmware-version: 3.25, 0x800005d0
expansion-rom-version:
bus-info: 0000:08:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

[system2] iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy DROP)
target     prot opt source               destination
DOCKER-USER  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate 
RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate 
RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain DOCKER (2 references)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             172.17.0.2           tcp dpt:5000
ACCEPT     tcp  --  anywhere             172.18.0.2           tcp dpt:9000

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere
DROP       all  --  anywhere             anywhere
RETURN     all  --  anywhere             anywhere

Chain DOCKER-USER (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere

Thanks,
Adam
________________________________________
From: Richard Cochran 
[richardcoch...@gmail.com<mailto:richardcoch...@gmail.com>]
Sent: Tuesday, September 03, 2019 11:08 PM
To: Essling, Adam M
Cc: 
linuxptp-users@lists.sourceforge.net<mailto:linuxptp-users@lists.sourceforge.net>
Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems

On Tue, Sep 03, 2019 at 09:00:08PM +0000, Essling, Adam M wrote:
> Hi, I'm trying to use ptp4l with [system1] and [system2] as slaves with a 
> Spectracom Versasync PTP clock as master. Both slave systems are running 
> Ubuntu 16.04 with ptp4l v1.8. I am using the default ptp4l.conf file. When I 
> run the following command (on system1):
>
> sudo ptp4l -i eth0 -f /etc/linuxptp/ptp4l.conf -m
>
> I get the following output:
> ptp4l: [7769.535] selected /dev/ptp0 as PTP clock
> ptp4l: [7769.537] port 1: INITIALIZING to LISTENING on INITIALIZE
> ptp4l: [7769.537] port 0: INITIALIZING to LISTENING on INITIALIZE
> ptp4l: [7769.537] port 1: link up
> ptp4l: [7770.221] port 1: new foreign master 000cec.fffe.xxxxxx-1

So we did receive an Announce message (I guess on the general port).

> ptp4l: [7770.221] recvmsg failed: No such file or directory

But here recvmsg() returns ENOENT.  Strange.

That error isn't listed on the man page.  I briefly scanned the kernel
stack, and there are indeed a few cases where ENOENT can be returned,
but I didn't see anything that could apply in this case.

Could this possibly be due to a firewall?

> I know the recvmsg failed error has something to do with the ptp4l
> socket but I'm not sure how to go about fixing it. Please let me
> know if more information is needed.

- uname -r
- ethtool -i
- iptables -L

Thanks,
Richard
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to