I am sorry to report that the proposed fix to the problem SLAVE to FAULTY
on FAULT_DETECTED (FT_UNSPECIFIED),
shown below did not resolve the issue.
Red Hat LINUX: with source kernel 4.9.0
Intel igb driver: 5.4.0-k
Prior to compiling the kernel:
cd /usr/src/linux-4.9/drivers/net/ethernet/intel/igb
Edit igb_main.c and comment out at line 5715:
/* wr32(E1000_TSICR, ack); */
Ran fine for a while then failed as shown below. Able to restore by
killing ptp4l, rmmod igb; modprobe igb, restart ptp4l.
Here is the ptp4l log after running successfully for 26.65 hours:
ptp4l[101975.294]: master offset -58 s2 freq +831 path delay
1632
ptp4l[101976.294]: linreg: points 8 slope 0.999999144 intercept 3 err 25
ptp4l[101976.294]: master offset -10 s2 freq +853 path delay
1632
ptp4l[101976.900]: port 1: delay timeout
ptp4l[101976.910]: timed out while polling for tx timestamp
ptp4l[101976.910]: increasing tx_timestamp_timeout may correct this issue,
but it is likely caused by a driver bug
ptp4l[101976.910]: port 1: send delay request failed
ptp4l[101976.910]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[101976.910]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[101992.911]: clearing fault on port 1
ptp4l[101992.911]: config item enp1s0f0.logMinDelayReqInterval is 2
ptp4l[101992.911]: config item enp1s0f0.logAnnounceInterval is 0
ptp4l[101992.911]: config item enp1s0f0.announceReceiptTimeout is 4
ptp4l[101992.911]: config item enp1s0f0.syncReceiptTimeout is 0
ptp4l[101992.911]: config item enp1s0f0.transportSpecific is 0
ptp4l[101992.911]: config item enp1s0f0.logSyncInterval is 0
ptp4l[101992.911]: config item enp1s0f0.logMinPdelayReqInterval is 2
ptp4l[101992.911]: config item enp1s0f0.neighborPropDelayThresh is 20000000
ptp4l[101992.911]: config item enp1s0f0.min_neighbor_prop_delay is -20000000
ptp4l[101992.911]: config item enp1s0f0.udp_ttl is 1
ptp4l[101992.915]: driver changed our HWTSTAMP options
ptp4l[101992.915]: tx_type 1 not 1
ptp4l[101992.915]: rx_filter 1 not 12
ptp4l[101992.915]: config item (null).dscp_event is 0
ptp4l[101992.915]: config item (null).dscp_general is 0
ptp4l[101992.915]: port 1: FAULTY to LISTENING on FAULT_CLEARED
ptp4l[101993.294]: port 1: setting asCapable
ptp4l[101993.299]: port 1: new foreign master 0019dd.fffe.00085c-1
ptp4l[101995.299]: selected best master clock 0019dd.fffe.00085c
ptp4l[101995.299]: foreign master not using PTP timescale
ptp4l[101995.299]: running in a temporal vortex
ptp4l[101995.299]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[101996.295]: linreg: points 8 slope 0.999999153 intercept 142 err 29
ptp4l[101996.295]: master offset -150 s2 freq +705 path delay
1632
ptp4l[101996.295]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[101996.635]: port 1: delay timeout
ptp4l[101996.645]: timed out while polling for tx timestamp
ptp4l[101996.645]: increasing tx_timestamp_timeout may correct this issue,
but it is likely caused by a driver bug
ptp4l[101996.645]: port 1: send delay request failed
ptp4l[101996.645]: port 1: SLAVE to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)
ptp4l[101996.645]: waiting 2^{4} seconds to clear fault on port 1
ptp4l[102012.645]: clearing fault on port 1
. . .
Richard Schmidt, CTR
Time Service Dept.
US Naval Observatory
On Wed, Dec 21, 2016 at 4:53 PM, Richard Cochran <richardcoch...@gmail.com>
wrote:
> On Wed, Dec 21, 2016 at 04:26:16PM -0500, Rich Schmidt wrote:
> > I've been testing linuxptp for about a year (now version 1.8) and am
> still
> > seeing the following failure always after 8 or more days of successful
> > operation:
>
> > ptp4l[4906544.301]: port 1: delay timeout
> > ptp4l[4906545.303]: timed out while polling for tx timestamp
> > ptp4l[4906545.303]: increasing tx_timestamp_timeout may correct this
> issue,
> > but it is likely cause
> > d by a driver bug
> > ptp4l[4906545.303]: port 1: send delay request failed
>
> I don't recalling seeing this myself, but still this is the second
> such igb failure report I have received recently.
>
> I wonder whether the incorrect double TSICR acknowledge is the root
> cause. In igb_main.c we have:
>
> static void igb_tsync_interrupt(struct igb_adapter *adapter)
> {
> struct e1000_hw *hw = &adapter->hw;
> struct ptp_clock_event event;
> struct timespec64 ts;
> u32 ack = 0, tsauxc, sec, nsec, tsicr = rd32(E1000_TSICR);
>
> ...
>
> /* acknowledge the interrupts */
> wr32(E1000_TSICR, ack);
> }
>
> According to the datasheet, the first rd32() should already
> acknowledge the interrupts, but the 82580 (iirc) has a bug that
> requires the additional wr32().
>
> Try removing that last line, and see if things improve...
>
> Thanks,
> Richard
>
--
"If you want to build a ship, don’t drum up people to collect wood and
don’t assign them tasks and work, but rather teach them to long for the
endless immensity of the sea."
- *Antoine de Saint-Exupéry*
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users