I've been testing linuxptp for about a year (now version 1.8) and am still
seeing the following failure always after 8 or more days of successful
operation:
port 1: send delay request failed
port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
It then goes into an endless loop of attempting to clear, getting some
good master offsets, then faulting again. *Killing and restarting ptp4l
does not fix the problem. It requires a power cycle of the server (power
cycle of the NIC). *
That makes me suspect the igb driver itself.
In the latest test linuxptp ran for *14.7 days* before FAULTing, leading me
to wonder if there is some memory leak in the igb driver? Is igb driver
5.3.5.4 supported by linuxptp? Should I be using an earlier version?
Host: Cisco C240M4 Red Hat Enterprise Linux 7: 3.10.0-327.28.2.el7.x86_64
NIC: i350
ethtool -T enp1s0f0
Time stamping parameters for enp1s0f0:
Capabilities:
hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
off (HWTSTAMP_TX_OFF)
on (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
none (HWTSTAMP_FILTER_NONE)
all (HWTSTAMP_FILTER_ALL)
ethtool -i enp1s0f0
driver: igb
version: 5.3.5.4
firmware-version: 1.63, 0x80000c25, 0.384.130
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
Valgrind finds no memory leak in ptp4l:
==3544== HEAP SUMMARY:
==3544== in use at exit: 0 bytes in 0 blocks
==3544== total heap usage: 200 allocs, 200 frees, 40,202 bytes allocated
==3544==
==3544== All heap blocks were freed -- no leaks are possible
==3544==
==3544== For counts of detected and suppressed errors, rerun with: -v
==3544== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
Example output running fine then generating FAULT:
ptp4l[4906535.703]: linreg: points 16 slope 0.999999322 intercept 15 err 29
ptp4l[4906535.703]: master offset -48 s2 freq +663 path delay
1647
ptp4l[4906536.703]: linreg: points 16 slope 0.999999323 intercept 17 err 30
ptp4l[4906536.703]: master offset -72 s2 freq +660 path delay
1647
ptp4l[4906537.703]: linreg: points 16 slope 0.999999323 intercept -5 err 30
ptp4l[4906537.703]: master offset 28 s2 freq +682 path delay
1647
ptp4l[4906538.703]: linreg: points 16 slope 0.999999325 intercept 10 err 29
ptp4l[4906538.703]: master offset -14 s2 freq +666 path delay
1647
ptp4l[4906539.478]: port 1: delay timeout
ptp4l[4906539.503]: delay filtered 1650 raw 1667
ptp4l[4906539.703]: linreg: points 16 slope 0.999999325 intercept 3 err 29
ptp4l[4906539.703]: master offset -3 s2 freq +671 path delay
1650
ptp4l[4906540.701]: port 1: delay timeout
ptp4l[4906540.704]: linreg: points 16 slope 0.999999327 intercept 15 err 30
ptp4l[4906540.704]: master offset -75 s2 freq +658 path delay
1650
ptp4l[4906540.738]: delay filtered 1653 raw 15548
ptp4l[4906541.703]: linreg: points 16 slope 0.999999328 intercept 7 err 30
ptp4l[4906541.703]: master offset -17 s2 freq +666 path delay
1653
ptp4l[4906542.703]: linreg: points 16 slope 0.999999330 intercept 19 err 30
ptp4l[4906542.703]: master offset -43 s2 freq +651 path delay
1653
ptp4l[4906543.703]: linreg: points 16 slope 0.999999331 intercept 4 err 30
ptp4l[4906543.703]: master offset -14 s2 freq +666 path delay
1653
ptp4l[4906544.301]: port 1: delay timeout
ptp4l[4906545.303]: timed out while polling for tx timestamp
ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906545.303]: port 1: send delay request failed
ptp4l[4906545.303]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906545.303]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906561.303]: clearing fault on port 1
ptp4l[4906561.303]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[4906561.303]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[4906561.303]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[4906561.303]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[4906561.303]: config item enp1s0f0.transportSpecific is 0
ptp4l[4906561.303]: config item enp1s0f0.logSyncInterval is 0
ptp4l[4906561.303]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[4906561.303]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[4906561.303]: config item enp1s0f0.min_neighbor_prop_delay is
-20000000
ptp4l[4906561.303]: config item enp1s0f0.udp_ttl is 1
ptp4l[4906561.305]: driver changed our HWTSTAMP options
ptp4l[4906561.305]: tx_type 1 not 1
ptp4l[4906561.305]: rx_filter 1 not 12
ptp4l[4906561.305]: config item (null).dscp_event is 0
ptp4l[4906561.305]: config item (null).dscp_general is 0
ptp4l[4906561.305]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[4906561.703]: port 1: setting asCapable
ptp4l[4906561.713]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[4906563.713]: selected best master clock 0019dd.fffe.00085c
ptp4l[4906563.713]: foreign master not using PTP timescale
ptp4l[4906563.713]: running in a temporal vortex
ptp4l[4906563.713]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[4906564.704]: linreg: points 8 slope 0.999999339 intercept 123 err 31
ptp4l[4906564.704]: master offset -120 s2 freq +538 path delay
1653
ptp4l[4906564.704]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[4906565.704]: linreg: points 8 slope 0.999999340 intercept 29 err 31
ptp4l[4906565.704]: master offset -59 s2 freq +631 path delay
1653
ptp4l[4906566.704]: linreg: points 8 slope 0.999999340 intercept 3 err 31
ptp4l[4906566.704]: master offset -10 s2 freq +656 path delay
1653
ptp4l[4906567.704]: linreg: points 8 slope 0.999999339 intercept -16 err 31
ptp4l[4906567.704]: master offset 53 s2 freq +676 path delay
1653
ptp4l[4906567.720]: port 1: delay timeout
ptp4l[4906568.721]: timed out while polling for tx timestamp
ptp4l[4906568.721]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906568.721]: port 1: send delay request failed
ptp4l[4906568.721]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906568.721]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906584.722]: clearing fault on port 1
ptp4l[4906584.722]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[4906584.722]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[4906584.722]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[4906584.722]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[4906584.722]: config item enp1s0f0.transportSpecific is 0
ptp4l[4906584.722]: config item enp1s0f0.logSyncInterval is 0
ptp4l[4906584.722]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[4906584.722]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[4906584.722]: config item enp1s0f0.min_neighbor_prop_delay is
-20000000
ptp4l[4906584.722]: config item enp1s0f0.udp_ttl is 1
ptp4l[4906584.722]: driver changed our HWTSTAMP options
ptp4l[4906584.722]: tx_type 1 not 1
ptp4l[4906584.722]: rx_filter 1 not 12
ptp4l[4906584.722]: config item (null).dscp_event is 0
ptp4l[4906584.722]: config item (null).dscp_general is 0
ptp4l[4906584.722]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[4906585.704]: port 1: setting asCapable
ptp4l[4906585.714]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[4906587.714]: selected best master clock 0019dd.fffe.00085c
ptp4l[4906587.714]: foreign master not using PTP timescale
ptp4l[4906587.714]: running in a temporal vortex
ptp4l[4906587.714]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[4906588.705]: linreg: points 8 slope 0.999999345 intercept 506 err 37
ptp4l[4906588.705]: master offset -645 s2 freq +149 path delay
1653
ptp4l[4906588.705]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[4906589.705]: linreg: points 8 slope 0.999999347 intercept 81 err 40
ptp4l[4906589.705]: master offset -194 s2 freq +572 path delay
1653
ptp4l[4906590.705]: linreg: points 8 slope 0.999999351 intercept 67 err 43
ptp4l[4906590.705]: master offset -166 s2 freq +582 path delay
1653
ptp4l[4906591.705]: linreg: points 8 slope 0.999999356 intercept 62 err 43
ptp4l[4906591.705]: master offset -68 s2 freq +582 path delay
1653
ptp4l[4906592.417]: port 1: delay timeout
ptp4l[4906593.418]: timed out while polling for tx timestamp
ptp4l[4906593.418]: increasing tx_timestamp_timeout may correct this issue,
but it is likely cause
d by a driver bug
ptp4l[4906593.418]: port 1: send delay request failed
ptp4l[4906593.418]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[4906593.418]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[4906609.419]: clearing fault on port 1
Rich Schmidt, CTR, USNO
--
"If you want to build a ship, don’t drum up people to collect wood and
don’t assign them tasks and work, but rather teach them to long for the
endless immensity of the sea."
- *Antoine de Saint-Exupéry*
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users