On Wed, Apr 05, 2017 at 02:34:14PM +0000, Ian Thompson wrote:
> Why is the time that gets put into the PTP registers in the STM MAC, Unix 
> time rather than PTP time?

See below.

To you question from the other thread:

On Tue, Apr 04, 2017 at 03:45:16PM +0000, Ian Thompson wrote:
> Possibly following on from David’s post.
> 
> We have a system with 18 boards in a rack, each board has a Altera SoC with 
> the STM Ethernet MAC connected via gigabit Ethernet to an Arista ptp-aware 
> switch and then a Spectracom GrandMaster.
> The boards are running Linux kernel 3.15.0.

That HW puts the time stamps into the buffer descriptor, and so in
theory it should never miss a time stamp.  This is most likely a
driver bug.  Looking at the git log I see:

 v4.11-rc1~124^2~171^2~12 deeb637 net: stmmac: remove freesoftware address
     v4.9-rc7~33^2~33^2~1 ba1ffd7 stmmac: fix PTP support for GMAC4
     v4.9-rc7~33^2~33^2~2 d204205 stmmac: update the PTP header file
         v4.9-rc4~28^2~68 c30a70d stmmac: fix and review the ptp registration.
         v4.9-rc4~28^2~96 50756eb stmmac: fix an error code in 
stmmac_ptp_register()
         v4.9-rc1~28^2~10 7086605 stmmac: fix error check when init ptp
       v4.9-rc1~127^2~108 efee95f ptp_clock: future-proofing drivers against 
PTP subsystem becoming optional
   v4.1-rc1~128^2~100^2~5 e7ea55b ptp: stmmac: use helpers for converting ns to 
timespec.
   v4.1-rc1~128^2~119^2~6 3f6c465 ptp: stmmac: convert to the 64 bit get/set 
time methods.
        v3.17-rc5~41^2~38 5566401 stmmac: ptp: fix the reference clock
        v3.17-rc5~41^2~50 f95f404 stmmac: set ptp_clock to NULL while unregister
  v3.15-rc1~113^2~108^2~5 4986b4f0 ptp: drivers: set the number of programmable 
pins.
           v3.13-rc7~13^2 7cd0139 stmmac: Fix incorrect spinlock release and 
PTP cap detection.
       v3.10-rc1~66^2~195 32ceabc stmmac: improve/review and fix kernel-doc
   v3.10-rc1~66^2~327^2~1 92ba688 stmmac: add the support for PTP hw clock 
driver
   v3.10-rc1~66^2~327^2~2 891434b stmmac: add IEEE PTPv1 and PTPv2 support.

Especially ba1ffd7 looks suspicious.

> Apr  4 13:42:04 localhost user.info   ptp4l: [537.164] rms  123 max  599 freq 
>   +255 +/-  39 delay  7362 +/-  48
> Apr  4 13:42:29 localhost user.err    ptp4l: [561.387] timed out while 
> polling for tx timestamp
> Apr  4 13:42:29 localhost user.err    ptp4l: [561.387] increasing 
> tx_timestamp_timeout may correct this issue, but it is likely caused by a 
> driver bug
> Apr  4 13:42:29 localhost user.err    ptp4l: [561.387] port 1: send delay 
> request failed
> Apr  4 13:42:29 localhost user.notice ptp4l: [561.387] port 1: SLAVE to 
> FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
> Apr  4 13:42:45 localhost user.notice ptp4l: [577.388] port 1: FAULTY to 
> LISTENING on FAULT_CLEARED
> Apr  4 13:42:45 localhost user.warn   ptp4l: [577.414] clockcheck: clock 
> jumped backward or running slower than expected!
> Apr  4 13:42:45 localhost user.notice ptp4l: [577.414] port 1: new foreign 
> master 000cec.fffe.0a085d-1
> Apr  4 13:42:47 localhost user.notice ptp4l: [579.414] selected best master 
> clock 000cec.fffe.0a085d
> Apr  4 13:42:47 localhost user.notice ptp4l: [579.414] port 1: LISTENING to 
> UNCALIBRATED on RS_SLAVE
> Apr  4 13:42:54 localhost user.notice ptp4l: [587.164] port 1: UNCALIBRATED 
> to SLAVE on MASTER_CLOCK_SELECTED
> Apr  4 13:46:46 localhost user.info   ptp4l: [818.414] rms 2312500092 max 
> 37000001557 freq   +246 +/- 250 delay  7358 +/-  46
> Apr  4 13:51:02 localhost user.info   ptp4l: [1074.413] rms  116 max  681 
> freq   +256 +/-  48 delay  7373 +/-  88
> 
> Does this imply that one lost delay request can do this, or is there a retry 
> mechanism?

One lost delay request shouldn't introduct such a large error.  This
is a driver bug.  Notice that the time error is 37 seconds, or the
UTC/TAI offset.

When resetting the fault, ptp4l re-initializes HW time stamping.

The funtion, stmmac_hwtstamp_ioctl(), in

    drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 

programs the system time (UTC) into the PHC every time HW time
stamping is enabled.  It shouldn't do that.

> We have a lot of traffic leaving the boards but only PTP traffic
> coming in. As we increase the off board transfer rates the problem
> seems to occur more often.

That could indicate a driver or a HW issue, or both.

HTH,
Richard

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to