Hey all, I just wanted to wrap up this thread in case anyone else comes across this issue. It turns out the source of my problems was the clock I was using. The VersaSync wasn't sending delay-response messages which was why the slave systems would never actually sync to VersaSync. This was due to having both ethernet ports of the VersaSync in the same network. The VersaSync's default ethernet port is eth0 (and can't be changed right now due to a bug) but PTP is only available on eth1. Disconnecting eth0 or setting its IP address outside the range of eth1's network is currently the best workaround until the option to change the default ethernet port is fixed in the VersaSync's web gui.
Thank you Richard and David for your help! Adam ________________________________ From: Essling, Adam M <adam.essl...@udri.udayton.edu> Sent: Friday, September 20, 2019 12:20 PM To: David Mirabito Cc: linuxptp-users@lists.sourceforge.net Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems Another brief update here, I'm still waiting to hear back from the clock manufacturer. In the meantime, I did update the clock's on-board software from 1.3.1d to 1.3.1k, the latest release. Unfortunately the slave systems still do not actually enter slave mode, the same as before. I did think of another question. Do I need to run an instance of ptp4l on the actual VersaSync itself in order for the slave systems to communicate properly with it? I don't know if I should or even can do this, but I thought I should at least ask. The only configuration I've done on the VersaSync has been via the web UI. Thanks, Adam ________________________________ From: Essling, Adam M [adam.essl...@udri.udayton.edu] Sent: Wednesday, September 11, 2019 2:49 PM To: David Mirabito Cc: linuxptp-users@lists.sourceforge.net Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems Hey David, Sorry for the delay in responding - I was traveling. I have a few updates! 1. The slave systems never enter SLAVE. It appears as though they are stuck in UNCALIBRATED, even if I let it sit for a long time. The only thing that happens is occasionally the "selected best master clock" message will be displayed (still with the correct MAC of the Versasync at least). 2. I completely flushed iptables and tried to run ptp4l but still was getting the same results as in my previous post. 3. I ran TCPDUMP while ptp4l was running and this time I did notice some udp messages with 0 length, so I added in the rule you suggested. That stopped the "port 1: bad message" output, but the slave still seems to be stuck in UNCALIBRATED. 4. I was able to remove the Versasync from the network and set [system2] as a master clock with [system1] as a slave to [system2]. This worked perfectly, which leads me to believe the issue here is the Versasync Clock. I'm going to look into possible firmware updates or some support from the clock manufacturer. Adam ________________________________ From: David Mirabito [davi...@arista.com] Sent: Wednesday, September 04, 2019 4:47 PM To: Essling, Adam M Cc: Richard Cochran; linuxptp-users@lists.sourceforge.net Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems Hey Adam, That's interesting, if the patch turned "FAULTY" into "bad message" logs then it feels somewhat related. The change should only affect packets where recvmsg successfully returns zero, as you see. After entering UNCALIBRATED did it eventually SLAVE? (despite the bad messages being sent each second - annoying for log noise but at least no longer resetting ptp4l's state machine, you'd see the same if something was spamming 1 byte payloads that still aren't a valid PTP header ...). Did you TCPDUMP whilst ptp4l was running? It's possible the master doesn't send (malformed) status requests to non-active clients, or could be something to do with multicast subscriptions perhaps. I'd expect the undersized packet to come amid a flurry of other management GET messages. Alternatively one can drop such undersized packets before it gets anywhere near ptp4l. I'm not super knowledgable on firewalls, but this seems to do the trick, too - ether preventing faults pre-patch, or stopping noisy 'bad message' logs post-patch iptables -A INPUT -p udp -m udp --dport 320 -m length --length 28 -j DROP I have seen this in the wild a couple of times, with PTP master appliance devices. It's not clear why tcpdump test was negative, but the appearance of bad messages still sounds promising. Cheers, David PS: it may also be worth looking for a master firmware update. I've heard rumours one may have been in the pipeline earlier this year, but no confirmation if that occurred and/or if it addressed this specific issue. We carry the linked patch to ptp4l just to be sure. On Thu, 5 Sep 2019 at 01:26, Essling, Adam M <adam.essl...@udri.udayton.edu<mailto:adam.essl...@udri.udayton.edu>> wrote: Thank you both for responding so quickly! David, I tried the patch you recommended and but ended up getting this result: ptp4l[4756130.278]: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l[4756130.278]: port 0: INITIALIZING to LISTENING on INITIALIZE ptp4l[4756130.278]: port 1: link up ptp4l[4756130.952]: port 1: bad message ptp4l[4756131.951]: port 1: new foreign master 000cec.fffe.0d013a-1 ptp4l[4756131.952]: port 1: bad message ptp4l[4756132.952]: port 1: bad message ptp4l[4756133.952]: port 1: bad message ptp4l[4756134.952]: port 1: bad message ptp4l[4756135.951]: selected best master clock 000cec.fffe.0d013a ptp4l[4756135.951]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE ptp4l[4756135.952]: port 1: bad message Followed by continuous bad messages after that. I checked the tcpdump and none of the UDP messages I saw had a zero payload, so I'm not sure what the bad messages are. I also tried running ptp4l v2.0 and I got a slightly different error message: ptp4l: [8549.759] selected /dev/ptp0 as PTP clock ptp4l: [8549.761] port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l: [8549.761] port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l: [8550.221] port 1: new foreign master 000cec.fffe.0d013a-1 ptp4l: [8550.221] recvmsg failed: No such device or address ptp4l: [8550.222] port 1: recv message failed ptp4l: [8550.222] port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ... I'm not sure if it's relevant but I have been able to use ptpd2 with both slave systems and the Versasync clock with the same network setup. Richard, Here's the info you requested: [system1] uname -r 4.4.38-tegra [system1] ethtool -i driver: eqos version: firmware-version: expansion-rom-version: bus-info: 2490000.ether_qos supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no [system1] iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (1 references) target prot opt source destination DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere [system2] uname -r 4.15.0-47-generic [system2] ethtool -i driver: igb version: 5.4.0-k firmware-version: 3.25, 0x800005d0 expansion-rom-version: bus-info: 0000:08:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes [system2] iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination DOCKER-USER all -- anywhere anywhere DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (2 references) target prot opt source destination ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:5000 ACCEPT tcp -- anywhere 172.18.0.2 tcp dpt:9000 Chain DOCKER-ISOLATION-STAGE-1 (1 references) target prot opt source destination DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-ISOLATION-STAGE-2 (2 references) target prot opt source destination DROP all -- anywhere anywhere DROP all -- anywhere anywhere RETURN all -- anywhere anywhere Chain DOCKER-USER (1 references) target prot opt source destination RETURN all -- anywhere anywhere Thanks, Adam ________________________________________ From: Richard Cochran [richardcoch...@gmail.com<mailto:richardcoch...@gmail.com>] Sent: Tuesday, September 03, 2019 11:08 PM To: Essling, Adam M Cc: linuxptp-users@lists.sourceforge.net<mailto:linuxptp-users@lists.sourceforge.net> Subject: Re: [Linuxptp-users] recvmsg failed error on multiple slave systems On Tue, Sep 03, 2019 at 09:00:08PM +0000, Essling, Adam M wrote: > Hi, I'm trying to use ptp4l with [system1] and [system2] as slaves with a > Spectracom Versasync PTP clock as master. Both slave systems are running > Ubuntu 16.04 with ptp4l v1.8. I am using the default ptp4l.conf file. When I > run the following command (on system1): > > sudo ptp4l -i eth0 -f /etc/linuxptp/ptp4l.conf -m > > I get the following output: > ptp4l: [7769.535] selected /dev/ptp0 as PTP clock > ptp4l: [7769.537] port 1: INITIALIZING to LISTENING on INITIALIZE > ptp4l: [7769.537] port 0: INITIALIZING to LISTENING on INITIALIZE > ptp4l: [7769.537] port 1: link up > ptp4l: [7770.221] port 1: new foreign master 000cec.fffe.xxxxxx-1 So we did receive an Announce message (I guess on the general port). > ptp4l: [7770.221] recvmsg failed: No such file or directory But here recvmsg() returns ENOENT. Strange. That error isn't listed on the man page. I briefly scanned the kernel stack, and there are indeed a few cases where ENOENT can be returned, but I didn't see anything that could apply in this case. Could this possibly be due to a firewall? > I know the recvmsg failed error has something to do with the ptp4l > socket but I'm not sure how to go about fixing it. Please let me > know if more information is needed. - uname -r - ethtool -i - iptables -L Thanks, Richard
_______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users