Yes, I have tried with 100ms timeout, and the issue still exists. On Thu, Jun 10, 2021, 10:17 Grygorii Strashko <grygorii.stras...@ti.com> wrote:
> > > On 10/06/2021 09:24, YENDstudio wrote: > > Hi Jacob, > > > > Thanks for the info. > > I have seen the fault recovery working on the slave machine. But the > master port of the BC machine didn't recover (no activity on the port). > Most of the fault occurs after sending a sync message and requesting the > tx-timestamp which would be sent in the follow-up message. > > I have even tried a few options (resetting the state machine, > re-initializing the port, closing-and-opening the port) but without success. > > have you tried to play with Try "tx_timestamp_timeout". > > > > > Br, > > Yihenew > > > > > > On Wed, Jun 9, 2021, 13:02 Jacob Keller <jacob.e.kel...@intel.com > <mailto:jacob.e.kel...@intel.com>> wrote: > > > > > > > > On 6/7/2021 1:19 PM, YENDstudio wrote: > > > Hello, > > > > > > I have configure one of my machines as a unicast BC which is > > > synchronized to the grandmaster clock via the first of it's two > ports. > > > The second port is used to provide sync to another local machine. > This > > > setup works for a few hours after which one of the ports (master > port) > > > is marked as faulty, and it never recovers (the second machine > stops > > > receiving sync) until I restart the ptp4l application. Yet, the > first > > > port continues sync'ing with the grandmaster clock. > > > > The fault is triggered by a timeout during polling of tx > timestamp > > > (sk_receive function call). As I am not able to fix this issue, I > would > > > like to at least make the ptp application recover the port > > > automatically. I had tried to close-then-open the port when it > goes to a > > > FAULTY state but it didn't help (the slave machine is not able to > sync). > > > > > > > Hi, > > > > ptp4l already attempts recovery from a fault after the fault reset > > timeout. This is something like 15 seconds by default. > > > > You should see it recover, something like: > > > > > > > ptp4l[1022068.490]: selected /dev/ptp2 as PTP clock > > > ptp4l[1022068.510]: port 1 (enp244s0f0): INITIALIZING to > LISTENING on INIT_COMPLETE > > > ptp4l[1022068.510]: port 0 (/var/run/ptp4l): INITIALIZING to > LISTENING on INIT_COMPLETE > > > ptp4l[1022068.510]: port 0 (/var/run/ptp4lro): INITIALIZING to > LISTENING on INIT_COMPLETE > > > ptp4l[1022070.454]: port 1 (enp244s0f0): new foreign master > 527b94.fffe.96b1f3-1 > > > ptp4l[1022074.454]: selected best master clock 527b94.fffe.96b1f3 > > > ptp4l[1022074.454]: port 1 (enp244s0f0): LISTENING to > UNCALIBRATED on RS_SLAVE > > > ptp4l[1022076.454]: master offset 3148999551 s0 freq +0 path > delay 1466 > > > ptp4l[1022077.482]: master offset 3149000658 s1 freq +1107 path > delay 1615 > > > ptp4l[1022078.029]: timed out while polling for tx timestamp > > > ptp4l[1022078.029]: increasing tx_timestamp_timeout may correct > this issue, but it is likely caused by a driver bug > > > ptp4l[1022078.029]: port 1 (enp244s0f0): send delay request failed > > > ptp4l[1022078.029]: port 1 (enp244s0f0): UNCALIBRATED to FAULTY > on FAULT_DETECTED (FT_UNSPECIFIED) > > > ptp4l[1022082.057]: port 1 (enp244s0f0): FAULTY to LISTENING on > INIT_COMPLETE > > > > ^^^ > > Specifically here. > > > > > ptp4l[1022082.455]: port 1 (enp244s0f0): new foreign master > 527b94.fffe.96b1f3-1 > > > ptp4l[1022086.455]: selected best master clock 527b94.fffe.96b1f3 > > > ptp4l[1022086.455]: port 1 (enp244s0f0): LISTENING to > UNCALIBRATED on RS_SLAVE > > > ptp4l[1022087.460]: master offset -7124120 s2 freq -7123013 > path delay 1615 > > > ptp4l[1022087.460]: port 1 (enp244s0f0): UNCALIBRATED to SLAVE on > MASTER_CLOCK_SELECTED > > > ptp4l[1022088.460]: master offset -39903 s2 freq -2176032 > path delay 1615 > > > ptp4l[1022089.460]: master offset 2165416 s2 freq +17316 path > delay 1466 > > > ptp4l[1022090.460]: master offset 2161742 s2 freq +663267 path > delay 1615 > > > ptp4l[1022091.460]: master offset 1503260 s2 freq +653307 path > delay 1615 > > > ptp4l[1022092.460]: master offset 850970 s2 freq +451995 path > delay 1764 > > > ptp4l[1022093.460]: master offset 398679 s2 freq +254995 path > delay 2160 > > > ptp4l[1022094.460]: master offset 143441 s2 freq +119361 path > delay 2556 > > > ptp4l[1022095.460]: master offset 2567 s2 freq +21519 path > delay 24523 > > > > > > If you're seeing that but it fails to actually recover, (i.e.e > > timestamps never begin working again), this is likely a fault of the > > driver or hardware for the device. > > > > -- > Best regards, > grygorii >
_______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel