On 10/06/2021 09:24, YENDstudio wrote:
Hi Jacob,

Thanks for the info.
I have seen the fault recovery working on the slave machine. But the master 
port of the BC machine didn't recover (no activity on the port). Most of the 
fault occurs after sending a sync message and requesting the tx-timestamp which 
would be sent in the follow-up message.
I have even tried a few options (resetting the state machine, re-initializing 
the port, closing-and-opening the port) but without success.

have you tried to play with Try  "tx_timestamp_timeout".


Br,
Yihenew


On Wed, Jun 9, 2021, 13:02 Jacob Keller <jacob.e.kel...@intel.com 
<mailto:jacob.e.kel...@intel.com>> wrote:



    On 6/7/2021 1:19 PM, YENDstudio wrote:
     > Hello,
     >
     > I have configure one of my machines as a unicast BC which is
     > synchronized to the grandmaster clock via the first of it's two ports.
     > The second port is used to provide sync to another local machine. This
     > setup works for a few hours after which one of the ports (master port)
     > is marked as faulty, and it never recovers (the second machine stops
     > receiving sync) until I restart the ptp4l application. Yet, the first
     > port continues sync'ing with the grandmaster clock.
     > > The fault is triggered by a timeout during polling of tx timestamp
     > (sk_receive function call). As I am not able to fix this issue, I would
     > like to at least make the ptp application recover the port
     > automatically. I had tried to close-then-open the port when it goes to a
     > FAULTY state but it didn't help (the slave machine is not able to sync).
     >

    Hi,

    ptp4l already attempts recovery from a fault after the fault reset
    timeout. This is something like 15 seconds by default.

    You should see it recover, something like:


     > ptp4l[1022068.490]: selected /dev/ptp2 as PTP clock
     > ptp4l[1022068.510]: port 1 (enp244s0f0): INITIALIZING to LISTENING on 
INIT_COMPLETE
     > ptp4l[1022068.510]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING 
on INIT_COMPLETE
     > ptp4l[1022068.510]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING 
on INIT_COMPLETE
     > ptp4l[1022070.454]: port 1 (enp244s0f0): new foreign master 
527b94.fffe.96b1f3-1
     > ptp4l[1022074.454]: selected best master clock 527b94.fffe.96b1f3
     > ptp4l[1022074.454]: port 1 (enp244s0f0): LISTENING to UNCALIBRATED on 
RS_SLAVE
     > ptp4l[1022076.454]: master offset 3148999551 s0 freq      +0 path delay  
    1466
     > ptp4l[1022077.482]: master offset 3149000658 s1 freq   +1107 path delay  
    1615
     > ptp4l[1022078.029]: timed out while polling for tx timestamp
     > ptp4l[1022078.029]: increasing tx_timestamp_timeout may correct this 
issue, but it is likely caused by a driver bug
     > ptp4l[1022078.029]: port 1 (enp244s0f0): send delay request failed
     > ptp4l[1022078.029]: port 1 (enp244s0f0): UNCALIBRATED to FAULTY on 
FAULT_DETECTED (FT_UNSPECIFIED)
     > ptp4l[1022082.057]: port 1 (enp244s0f0): FAULTY to LISTENING on 
INIT_COMPLETE

    ^^^
    Specifically here.

     > ptp4l[1022082.455]: port 1 (enp244s0f0): new foreign master 
527b94.fffe.96b1f3-1
     > ptp4l[1022086.455]: selected best master clock 527b94.fffe.96b1f3
     > ptp4l[1022086.455]: port 1 (enp244s0f0): LISTENING to UNCALIBRATED on 
RS_SLAVE
     > ptp4l[1022087.460]: master offset   -7124120 s2 freq -7123013 path delay 
     1615
     > ptp4l[1022087.460]: port 1 (enp244s0f0): UNCALIBRATED to SLAVE on 
MASTER_CLOCK_SELECTED
     > ptp4l[1022088.460]: master offset     -39903 s2 freq -2176032 path delay 
     1615
     > ptp4l[1022089.460]: master offset    2165416 s2 freq  +17316 path delay  
    1466
     > ptp4l[1022090.460]: master offset    2161742 s2 freq +663267 path delay  
    1615
     > ptp4l[1022091.460]: master offset    1503260 s2 freq +653307 path delay  
    1615
     > ptp4l[1022092.460]: master offset     850970 s2 freq +451995 path delay  
    1764
     > ptp4l[1022093.460]: master offset     398679 s2 freq +254995 path delay  
    2160
     > ptp4l[1022094.460]: master offset     143441 s2 freq +119361 path delay  
    2556
     > ptp4l[1022095.460]: master offset       2567 s2 freq  +21519 path delay  
   24523


    If you're seeing that but it fails to actually recover, (i.e.e
    timestamps never begin working again), this is likely a fault of the
    driver or hardware for the device.


--
Best regards,
grygorii


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to