Re: [Linuxptp-users] ptp4l confusing UTC offset of different masters

Luke Bigum Fri, 03 May 2019 04:41:21 -0700

I think I can explain this away as expected behaviour / quirks between 
different clients and masters.  The key piece of information I was missing was 
what happens to the PHC as soon as ptp4l starts.


What I hadn't seen is that ptp4l immediately stepped the clock on startup based 
on the VS1's faster announce messages.  So the clock is stepped based on the 
VS1's poorly advertised time, and then when the slower announce messages from 
the VS2 take priority the PHC is already 37 seconds out.  Even though no names 
are named, I think the VS1 sending "creative" PTP Announce messages has been 
discussed here previously 
(https://sourceforge.net/p/linuxptp/mailman/message/36006501/).

We've only seen this problem on Metamako switches because we usually run PTPd 
as the client on all our Linux servers. There are some non-PTP spec compliant 
configuration options in PTPd that allow it to be a little more flexible with 
how it interprets the advertised UTC offset - it can and has been told to 
absolutely trust the UTC Offset field, no matter what other flags are set.  
I've got the vague feeling I knew this 2 years ago and have forgotten...  
Either way, I think ptp4l can't handle the craziness because it's doing the 
"right thing".

Since you're never going to make ptp4l less compliant, I think the only 
solution here is to never mix these two types of Grand Masters.
________________________________
From: Luke Bigum <[email protected]>
Sent: 30 April 2019 17:16
To: linuxptp-users
Subject: [Linuxptp-users] ptp4l confusing UTC offset of different masters

Hello,

I've got some odd PTP behaviour from a Metamako switch (which runs it's own 
version of linuxptp-1.9), and have reproduced similar behaviour with 
linuxptp-2.0.

We've got a new Grand Master appliance on trial, an Orelia VelaSync 2.  It's 
plugged into a LAN in our lab along side it's predecessor, the VelaSync 1 
(which is really FSM Labs' TimeKeeper Server).  These masters deal with the UTC 
offset and leap seconds differently.  The VS1 / TimeKeeper takes the inventive 
approach of publishing PTP messages containing UTC time and a UTC offset of 
zero (it does this so it can be in control of leap second slewing).  The VS2 
simply publishes TAI time with a UTC offset of 37 seconds.  As I understand it, 
net result should be the same: UTC - 0s and TAI - 37s.

I've set the VS2 priority1 field to 127, higher than the VS1's priority1 of 128 
(the point of the exercise is to test the VS2).

What appears to happen is ptp4l first tries to calibrate to the VS1 (the VS1 
sends a boat load more Announce messages than the VS2), and sets the UTC offset 
to zero:

ptp4l[419887.425]: selected best master clock 248a07.fffe.fdb701
ptp4l[419887.425]: updating UTC offset to 0
ptp4l[419887.425]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[419888.699]: rms  172 max  221 freq -24777 +/- 102 delay  2141 +/-   7
ptp4l[419889.727]: rms   96 max  141 freq -24730 +/-  28 delay  2142 +/-   4
ptp4l[419890.739]: rms   54 max   79 freq -24747 +/-  41 delay  2145 +/-   3

Not long after the VS2 is discovered, and selected as the best clock.  The new 
UTC offset flag is set to match the different time in the Follow_Up messages, 
but the clock jumps 37 seconds anyway:

ptp4l[419888.434]: port 1: new foreign master ec0d9a.fffe.2a5680-1
ptp4l[419892.434]: selected best master clock ffffff.ffff.ffffff
ptp4l[419892.434]: updating UTC offset to 37
ptp4l[419892.434]: port 1: SLAVE to UNCALIBRATED on RS_SLAVE
ptp4l[419892.811]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[419893.306]: selected best master clock ffffff.ffff.ffffff
ptp4l[419893.306]: updating UTC offset to 37
ptp4l[419893.434]: rms 32034559090 max 37000000060 freq -23256182 +/- 13412686 
delay  2139 +/-  20
ptp4l[419894.434]: rms 36963221067 max 36976771182 freq -31000000 +/-   0
ptp4l[419894.841]: selected best master clock ffffff.ffff.ffffff
ptp4l[419894.841]: updating UTC offset to 37
ptp4l[419895.352]: selected best master clock ffffff.ffff.ffffff
ptp4l[419895.352]: updating UTC offset to 37
ptp4l[419895.434]: rms 36930870374 max 36945061895 freq -31000000 +/-   0 delay 
-1099123 +/- 367074

Note the clock ID of 0xffff here is a bogus value from the VS2 announce 
messages, it's populating it's grandmasterClockIdentity field with a reserved 
address. I've already emailed Orelia about it (side question; why is ptp4l 
logging the grandmasterClockIdentity field in the log file as the "master" and 
not the clockIdentity field?).

When we were just running the VS1, everything was fine, this worked.  If you 
shut off the VS1 (or change it's PTP domain number) so just the VS2 messages 
are "seen", everything is also fine, it works.  The craziness only occurs when 
there is a mix of messages from both masters on the LAN.  Also, this doesn't 
affect 'ptpd' (which is what most of my clients run), it only affects ptp4l.

The other odd thing from the log (and the Metamako switch) is it constantly 
spits out these messages:

selected best master clock ffffff.ffff.ffffff

Which comes from handle_state_decision_event() in clock.c (remember, ignore the 
clock ID), and:

updating UTC offset to 37

Which is in clock_update_slave().  That seems odd to me... I would expect to 
see them during/after a BMC election, not over and over again.  The PTP 
priority hasn't changed, VS2 is still best clock.  Part of the condition of the 
second message is:

(c->tds.currentUtcOffset > c->utc_offset)

I wonder if the VS1 messages are somehow being processed (even though it's not 
master), and the UTC offset flag is being polluted and this is what's jamming 
the clock away from real time?

Full ptp4l log file is attached.


--

Luke Bigum


Head of Systems, LMAX Exchange

_______________________________________________
Linuxptp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Re: [Linuxptp-users] ptp4l confusing UTC offset of different masters

Reply via email to