Problem:
pmc incorrectly reporting Grandmasters connected when in fact they are
physically disconnected from the LAN. Only fixed after multiple restarts.
Scenario:
I have two PTP clients (RHEL 7) each using two NICs to sync to two
Grandmasters (Zyfer Gsyncs) using linuxptp-3.1.1. Has been working fine
for a year in all versions of linuxptp.
[Why am I running two ptp4l processes? To sync two NIC PHCs which are used
by NTP as refclocks. ]
I use pmc to check if I have synchronization to a Grandmaster.
*Yesterday the two Grandmasters were disconnected from the local Ethernet
switch.*
On one PTP slave pmc correctly reported the disconnect:
pmc -i enp10s0f2 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.37fe82-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 0
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent false
<------
gmIdentity b49691.fffe.37fe82
<----------- client
On a second PTP client, *the Grandmaster is reported as still present:*
pmc -i enp10s0f0 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 52
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent true
gmIdentity 0019dd.fffe.002009
pmc -i enp10s0f2 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 113
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent true <------
gmIdentity 0019dd.fffe.001ffb <------ Grandmaster
pmc -i enp10s0f0 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 52
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent true
gmIdentity 0019dd.fffe.002009
However "systemctl -l status ptp4l-1.service and ...ptp4l-2.service
correctly reports the connections are down:
systemctl -l status ptp4l-2
● ptp4l-2.service - Precision Time Protocol (PTP) service second interface
Loaded: loaded (/etc/systemd/system/ptp4l-2.service; enabled; vendor
preset: disabled)
Active: inactive (dead) since Tue 2022-09-27 20:56:02 UTC; 10s ago
Process: 2527 ExecStart=/usr/local/linuxptp/sbin/ptp4l $OPTIONS2
(code=exited, status=0/SUCCESS)
Main PID: 2527 (code=exited, status=0/SUCCESS)
Sep 27 20:54:55 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1618.728]:
selected local clock b49691.fffe.37fe82 as best master
Sep 27 20:55:03 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1626.969]:
selected local clock b49691.fffe.37fe82 as best master
Sep 27 20:55:13 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1636.869]:
selected local clock b49691.fffe.37fe82 as best master
Sep 27 20:55:23 dc-ntp01.rdte.usno.navy.mil ptp4l[2527]: ptp4l[1646.184]:
selected local clock b49691.fffe.37fe82 as best master
So I restart my ptp4l services (several times) but still pmc reports that
it sees the two Grandmasters.
Next I reboot the server, but it still "sees" the two Grandmasters.
Meanwhile the other server does not see them.
I copy the pmc binary from the server that does not see the Grandmasters to
the one that still does. Same result, systemctl reports no connect, but
pmc still sees Grandmasters on one of my two clients.
Eventually, following multiple stops and starts of ptp4l, it starts
correctly reporting:
enp10s0f0 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c204-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 0
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent false
gmIdentity b49691.fffe.35c204
-bash-4.2# pmc -i enp10s0f2 "GET TIME_STATUS_NP"
sending: GET TIME_STATUS_NP
b49691.fffe.35c206-1 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP
master_offset 0
ingress_time 0
cumulativeScaledRateOffset +0.000000000
scaledLastGmPhaseChange 0
gmTimeBaseIndicator 0
lastGmPhaseChange 0x0000'0000000000000000.0000
gmPresent false
gmIdentity b49691.fffe.35c206
Baffled,
Richard Schmidt
Precise Time Dept
US Naval Observatory
--
*"We learn from history that we learn nothing from history." *
*George Bernard Shaw *
“The ideal subject of totalitarian rule is not the convinced Nazi or the
convinced communist, but people for whom the distinction between fact and
fiction . . . and the distinction between true and false . . . no longer
exist.” —Hanna Arendt, “The Origins of Totalitarianism” (1951)
_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users