Hi Jacob,

Thanks for the info.
I have seen the fault recovery working on the slave machine. But the master
port of the BC machine didn't recover (no activity on the port). Most of
the fault occurs after sending a sync message and requesting the
tx-timestamp which would be sent in the follow-up message.
I have even tried a few options (resetting the state machine,
re-initializing the port, closing-and-opening the port) but without success.

Br,
Yihenew


On Wed, Jun 9, 2021, 13:02 Jacob Keller <jacob.e.kel...@intel.com> wrote:

>
>
> On 6/7/2021 1:19 PM, YENDstudio wrote:
> > Hello,
> >
> > I have configure one of my machines as a unicast BC which is
> > synchronized to the grandmaster clock via the first of it's two ports.
> > The second port is used to provide sync to another local machine. This
> > setup works for a few hours after which one of the ports (master port)
> > is marked as faulty, and it never recovers (the second machine stops
> > receiving sync) until I restart the ptp4l application. Yet, the first
> > port continues sync'ing with the grandmaster clock.
> > > The fault is triggered by a timeout during polling of tx timestamp
> > (sk_receive function call). As I am not able to fix this issue, I would
> > like to at least make the ptp application recover the port
> > automatically. I had tried to close-then-open the port when it goes to a
> > FAULTY state but it didn't help (the slave machine is not able to sync).
> >
>
> Hi,
>
> ptp4l already attempts recovery from a fault after the fault reset
> timeout. This is something like 15 seconds by default.
>
> You should see it recover, something like:
>
>
> > ptp4l[1022068.490]: selected /dev/ptp2 as PTP clock
> > ptp4l[1022068.510]: port 1 (enp244s0f0): INITIALIZING to LISTENING on
> INIT_COMPLETE
> > ptp4l[1022068.510]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING
> on INIT_COMPLETE
> > ptp4l[1022068.510]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING
> on INIT_COMPLETE
> > ptp4l[1022070.454]: port 1 (enp244s0f0): new foreign master
> 527b94.fffe.96b1f3-1
> > ptp4l[1022074.454]: selected best master clock 527b94.fffe.96b1f3
> > ptp4l[1022074.454]: port 1 (enp244s0f0): LISTENING to UNCALIBRATED on
> RS_SLAVE
> > ptp4l[1022076.454]: master offset 3148999551 s0 freq      +0 path delay
>     1466
> > ptp4l[1022077.482]: master offset 3149000658 s1 freq   +1107 path delay
>     1615
> > ptp4l[1022078.029]: timed out while polling for tx timestamp
> > ptp4l[1022078.029]: increasing tx_timestamp_timeout may correct this
> issue, but it is likely caused by a driver bug
> > ptp4l[1022078.029]: port 1 (enp244s0f0): send delay request failed
> > ptp4l[1022078.029]: port 1 (enp244s0f0): UNCALIBRATED to FAULTY on
> FAULT_DETECTED (FT_UNSPECIFIED)
> > ptp4l[1022082.057]: port 1 (enp244s0f0): FAULTY to LISTENING on
> INIT_COMPLETE
>
> ^^^
> Specifically here.
>
> > ptp4l[1022082.455]: port 1 (enp244s0f0): new foreign master
> 527b94.fffe.96b1f3-1
> > ptp4l[1022086.455]: selected best master clock 527b94.fffe.96b1f3
> > ptp4l[1022086.455]: port 1 (enp244s0f0): LISTENING to UNCALIBRATED on
> RS_SLAVE
> > ptp4l[1022087.460]: master offset   -7124120 s2 freq -7123013 path
> delay      1615
> > ptp4l[1022087.460]: port 1 (enp244s0f0): UNCALIBRATED to SLAVE on
> MASTER_CLOCK_SELECTED
> > ptp4l[1022088.460]: master offset     -39903 s2 freq -2176032 path
> delay      1615
> > ptp4l[1022089.460]: master offset    2165416 s2 freq  +17316 path delay
>     1466
> > ptp4l[1022090.460]: master offset    2161742 s2 freq +663267 path delay
>     1615
> > ptp4l[1022091.460]: master offset    1503260 s2 freq +653307 path delay
>     1615
> > ptp4l[1022092.460]: master offset     850970 s2 freq +451995 path delay
>     1764
> > ptp4l[1022093.460]: master offset     398679 s2 freq +254995 path delay
>     2160
> > ptp4l[1022094.460]: master offset     143441 s2 freq +119361 path delay
>     2556
> > ptp4l[1022095.460]: master offset       2567 s2 freq  +21519 path delay
>    24523
>
>
> If you're seeing that but it fails to actually recover, (i.e.e
> timestamps never begin working again), this is likely a fault of the
> driver or hardware for the device.
>
> Thanks,
> Jake
>
>
> _______________________________________________
> Linuxptp-devel mailing list
> Linuxptp-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
>
_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to