Yes, I have tried with 100ms timeout, and the issue still exists.

On Thu, Jun 10, 2021, 10:17 Grygorii Strashko <grygorii.stras...@ti.com>
wrote:

>
>
> On 10/06/2021 09:24, YENDstudio wrote:
> > Hi Jacob,
> >
> > Thanks for the info.
> > I have seen the fault recovery working on the slave machine. But the
> master port of the BC machine didn't recover (no activity on the port).
> Most of the fault occurs after sending a sync message and requesting the
> tx-timestamp which would be sent in the follow-up message.
> > I have even tried a few options (resetting the state machine,
> re-initializing the port, closing-and-opening the port) but without success.
>
> have you tried to play with Try  "tx_timestamp_timeout".
>
> >
> > Br,
> > Yihenew
> >
> >
> > On Wed, Jun 9, 2021, 13:02 Jacob Keller <jacob.e.kel...@intel.com
> <mailto:jacob.e.kel...@intel.com>> wrote:
> >
> >
> >
> >     On 6/7/2021 1:19 PM, YENDstudio wrote:
> >      > Hello,
> >      >
> >      > I have configure one of my machines as a unicast BC which is
> >      > synchronized to the grandmaster clock via the first of it's two
> ports.
> >      > The second port is used to provide sync to another local machine.
> This
> >      > setup works for a few hours after which one of the ports (master
> port)
> >      > is marked as faulty, and it never recovers (the second machine
> stops
> >      > receiving sync) until I restart the ptp4l application. Yet, the
> first
> >      > port continues sync'ing with the grandmaster clock.
> >      > > The fault is triggered by a timeout during polling of tx
> timestamp
> >      > (sk_receive function call). As I am not able to fix this issue, I
> would
> >      > like to at least make the ptp application recover the port
> >      > automatically. I had tried to close-then-open the port when it
> goes to a
> >      > FAULTY state but it didn't help (the slave machine is not able to
> sync).
> >      >
> >
> >     Hi,
> >
> >     ptp4l already attempts recovery from a fault after the fault reset
> >     timeout. This is something like 15 seconds by default.
> >
> >     You should see it recover, something like:
> >
> >
> >      > ptp4l[1022068.490]: selected /dev/ptp2 as PTP clock
> >      > ptp4l[1022068.510]: port 1 (enp244s0f0): INITIALIZING to
> LISTENING on INIT_COMPLETE
> >      > ptp4l[1022068.510]: port 0 (/var/run/ptp4l): INITIALIZING to
> LISTENING on INIT_COMPLETE
> >      > ptp4l[1022068.510]: port 0 (/var/run/ptp4lro): INITIALIZING to
> LISTENING on INIT_COMPLETE
> >      > ptp4l[1022070.454]: port 1 (enp244s0f0): new foreign master
> 527b94.fffe.96b1f3-1
> >      > ptp4l[1022074.454]: selected best master clock 527b94.fffe.96b1f3
> >      > ptp4l[1022074.454]: port 1 (enp244s0f0): LISTENING to
> UNCALIBRATED on RS_SLAVE
> >      > ptp4l[1022076.454]: master offset 3148999551 s0 freq      +0 path
> delay      1466
> >      > ptp4l[1022077.482]: master offset 3149000658 s1 freq   +1107 path
> delay      1615
> >      > ptp4l[1022078.029]: timed out while polling for tx timestamp
> >      > ptp4l[1022078.029]: increasing tx_timestamp_timeout may correct
> this issue, but it is likely caused by a driver bug
> >      > ptp4l[1022078.029]: port 1 (enp244s0f0): send delay request failed
> >      > ptp4l[1022078.029]: port 1 (enp244s0f0): UNCALIBRATED to FAULTY
> on FAULT_DETECTED (FT_UNSPECIFIED)
> >      > ptp4l[1022082.057]: port 1 (enp244s0f0): FAULTY to LISTENING on
> INIT_COMPLETE
> >
> >     ^^^
> >     Specifically here.
> >
> >      > ptp4l[1022082.455]: port 1 (enp244s0f0): new foreign master
> 527b94.fffe.96b1f3-1
> >      > ptp4l[1022086.455]: selected best master clock 527b94.fffe.96b1f3
> >      > ptp4l[1022086.455]: port 1 (enp244s0f0): LISTENING to
> UNCALIBRATED on RS_SLAVE
> >      > ptp4l[1022087.460]: master offset   -7124120 s2 freq -7123013
> path delay      1615
> >      > ptp4l[1022087.460]: port 1 (enp244s0f0): UNCALIBRATED to SLAVE on
> MASTER_CLOCK_SELECTED
> >      > ptp4l[1022088.460]: master offset     -39903 s2 freq -2176032
> path delay      1615
> >      > ptp4l[1022089.460]: master offset    2165416 s2 freq  +17316 path
> delay      1466
> >      > ptp4l[1022090.460]: master offset    2161742 s2 freq +663267 path
> delay      1615
> >      > ptp4l[1022091.460]: master offset    1503260 s2 freq +653307 path
> delay      1615
> >      > ptp4l[1022092.460]: master offset     850970 s2 freq +451995 path
> delay      1764
> >      > ptp4l[1022093.460]: master offset     398679 s2 freq +254995 path
> delay      2160
> >      > ptp4l[1022094.460]: master offset     143441 s2 freq +119361 path
> delay      2556
> >      > ptp4l[1022095.460]: master offset       2567 s2 freq  +21519 path
> delay     24523
> >
> >
> >     If you're seeing that but it fails to actually recover, (i.e.e
> >     timestamps never begin working again), this is likely a fault of the
> >     driver or hardware for the device.
> >
>
> --
> Best regards,
> grygorii
>
_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to