I've been testing linuxptp for about a year (now version 1.8) and am still
seeing the following failure always after 8 or more days of successful
operation:

port 1: send delay request failed
port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
​It then goes into an endless loop of attempting to clear, getting some
good master offsets, then faulting again. *Killing and restarting ptp4l
does not fix the problem. It requires a power cycle of the server (power
cycle of the NIC). *
That makes me suspect the igb driver itself.

In the latest test linuxptp ran for *14.7 days* before FAULTing, leading me
to wonder if there is some memory leak in the igb driver?   Is igb driver
5.3.5.4 supported by linuxptp? Should I be using an earlier version?

Host: Cisco C240M4 Red Hat Enterprise Linux 7: 3.10.0-327.28.2.el7.x86_64​
NIC: i350

ethtool -T enp1s0f0
Time stamping parameters for enp1s0f0:
Capabilities:
hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
off                   (HWTSTAMP_TX_OFF)
on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
none                  (HWTSTAMP_FILTER_NONE)
all                   (HWTSTAMP_FILTER_ALL)

ethtool -i enp1s0f0
driver: igb
version: 5.3.5.4
firmware-version: 1.63, 0x80000c25, 0.384.130
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


​Valgrind finds no memory leak in ptp4l:

==3544== HEAP SUMMARY:
==3544==     in use at exit: 0 bytes in 0 blocks
==3544==   total heap usage: 200 allocs, 200 frees, 40,202 bytes allocated
==3544==
==3544== All heap blocks were freed -- no leaks are possible
==3544==
==3544== For counts of detected and suppressed errors, rerun with: -v
==3544== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

​
​Example output running fine then generating FAULT:​



ptp4l[4906535.703]: linreg: points 16 slope 0.999999322 intercept 15 err 29
ptp4l[4906535.703]: master offset        -48 s2 freq    +663 path delay
 1647
ptp4l[4906536.703]: linreg: points 16 slope 0.999999323 intercept 17 err 30
ptp4l[4906536.703]: master offset        -72 s2 freq    +660 path delay
 1647
ptp4l[4906537.703]: linreg: points 16 slope 0.999999323 intercept -5 err 30
ptp4l[4906537.703]: master offset         28 s2 freq    +682 path delay
 1647
ptp4l[4906538.703]: linreg: points 16 slope 0.999999325 intercept 10 err 29
ptp4l[4906538.703]: master offset        -14 s2 freq    +666 path delay
 1647
ptp4l[4906539.478]: port 1: delay timeout
ptp4l[4906539.503]: delay   filtered       1650   raw       1667
ptp4l[4906539.703]: linreg: points 16 slope 0.999999325 intercept 3 err 29
ptp4l[4906539.703]: master offset         -3 s2 freq    +671 path delay
 1650
ptp4l[4906540.701]: port 1: delay timeout
ptp4l[4906540.704]: linreg: points 16 slope 0.999999327 intercept 15 err 30
ptp4l[4906540.704]: master offset        -75 s2 freq    +658 path delay
 1650
ptp4l[4906540.738]: delay   filtered       1653   raw      15548
ptp4l[4906541.703]: linreg: points 16 slope 0.999999328 intercept 7 err 30
ptp4l[4906541.703]: master offset        -17 s2 freq    +666 path delay
 1653
ptp4l[4906542.703]: linreg: points 16 slope 0.999999330 intercept 19 err 30
ptp4l[4906542.703]: master offset        -43 s2 freq    +651 path delay
 1653
ptp4l[4906543.703]: linreg: points 16 slope 0.999999331 intercept 4 err 30
ptp4l[4906543.703]: master offset        -14 s2 freq    +666 path delay
 1653
ptp4l[4906544.301]: port 1: delay timeout
ptp4l[4906545.303]: timed out while polling for tx timestamp
ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906545.303]: port 1: send delay request failed
ptp4l[4906545.303]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906545.303]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906561.303]: clearing fault on port 1
ptp4l[4906561.303]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[4906561.303]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[4906561.303]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[4906561.303]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[4906561.303]: config item enp1s0f0.transportSpecific is 0
ptp4l[4906561.303]: config item enp1s0f0.logSyncInterval is 0
ptp4l[4906561.303]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[4906561.303]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[4906561.303]: config item enp1s0f0.min_neighbor_prop_delay is
-20000000
ptp4l[4906561.303]: config item enp1s0f0.udp_ttl is 1
ptp4l[4906561.305]: driver changed our HWTSTAMP options
ptp4l[4906561.305]: tx_type   1 not 1
ptp4l[4906561.305]: rx_filter 1 not 12
ptp4l[4906561.305]: config item (null).dscp_event is 0
ptp4l[4906561.305]: config item (null).dscp_general is 0
ptp4l[4906561.305]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[4906561.703]: port 1: setting asCapable
ptp4l[4906561.713]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[4906563.713]: selected best master clock 0019dd.fffe.00085c
ptp4l[4906563.713]: foreign master not using PTP timescale
ptp4l[4906563.713]: running in a temporal vortex
ptp4l[4906563.713]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[4906564.704]: linreg: points 8 slope 0.999999339 intercept 123 err 31
ptp4l[4906564.704]: master offset       -120 s2 freq    +538 path delay
 1653
ptp4l[4906564.704]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[4906565.704]: linreg: points 8 slope 0.999999340 intercept 29 err 31
ptp4l[4906565.704]: master offset        -59 s2 freq    +631 path delay
 1653
ptp4l[4906566.704]: linreg: points 8 slope 0.999999340 intercept 3 err 31
ptp4l[4906566.704]: master offset        -10 s2 freq    +656 path delay
 1653
ptp4l[4906567.704]: linreg: points 8 slope 0.999999339 intercept -16 err 31
ptp4l[4906567.704]: master offset         53 s2 freq    +676 path delay
 1653
ptp4l[4906567.720]: port 1: delay timeout
ptp4l[4906568.721]: timed out while polling for tx timestamp
ptp4l[4906568.721]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906568.721]: port 1: send delay request failed
ptp4l[4906568.721]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906568.721]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906584.722]: clearing fault on port 1
ptp4l[4906584.722]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[4906584.722]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[4906584.722]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[4906584.722]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[4906584.722]: config item enp1s0f0.transportSpecific is 0
ptp4l[4906584.722]: config item enp1s0f0.logSyncInterval is 0
ptp4l[4906584.722]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[4906584.722]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[4906584.722]: config item enp1s0f0.min_neighbor_prop_delay is
-20000000
ptp4l[4906584.722]: config item enp1s0f0.udp_ttl is 1
ptp4l[4906584.722]: driver changed our HWTSTAMP options
ptp4l[4906584.722]: tx_type   1 not 1
ptp4l[4906584.722]: rx_filter 1 not 12
ptp4l[4906584.722]: config item (null).dscp_event is 0
ptp4l[4906584.722]: config item (null).dscp_general is 0
ptp4l[4906584.722]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[4906585.704]: port 1: setting asCapable
ptp4l[4906585.714]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[4906587.714]: selected best master clock 0019dd.fffe.00085c
ptp4l[4906587.714]: foreign master not using PTP timescale
ptp4l[4906587.714]: running in a temporal vortex
ptp4l[4906587.714]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[4906588.705]: linreg: points 8 slope 0.999999345 intercept 506 err 37
ptp4l[4906588.705]: master offset       -645 s2 freq    +149 path delay
 1653
ptp4l[4906588.705]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[4906589.705]: linreg: points 8 slope 0.999999347 intercept 81 err 40
ptp4l[4906589.705]: master offset       -194 s2 freq    +572 path delay
 1653
ptp4l[4906590.705]: linreg: points 8 slope 0.999999351 intercept 67 err 43
ptp4l[4906590.705]: master offset       -166 s2 freq    +582 path delay
 1653
ptp4l[4906591.705]: linreg: points 8 slope 0.999999356 intercept 62 err 43
ptp4l[4906591.705]: master offset        -68 s2 freq    +582 path delay
 1653
ptp4l[4906592.417]: port 1: delay timeout
ptp4l[4906593.418]: timed out while polling for tx timestamp
ptp4l[4906593.418]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906593.418]: port 1: send delay request failed
ptp4l[4906593.418]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906593.418]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906609.419]: clearing fault on port 1

​Rich Schmidt, CTR, USNO​

-- 
"If you want to build a ship, don’t drum up people to collect wood and
don’t assign them tasks and work, but rather teach them to long for the
endless immensity of the sea."

- *Antoine de Saint-Exupéry*
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to