Hi team,
In large distributed networks very many factors can lead to a short term spike 
in offset. Primarily network equipment without Transparent Clock support (even 
on a single device). Path delay calculations have the filtering buffer which 
helps to mitigate synchronous changes in path delay, however this doesn’t help 
if only syncs are affected for example. We often end up in a situation like 
this (for demonstration we set delay_filter_length = 1):

Apr 21 09:21:29 ptp4l[1732497.662]: master offset         17 s2 freq  -12361 
path delay      4070
Apr 21 09:21:30 ptp4l[1732498.662]: master offset          0 s2 freq  -12373 
path delay      4074
Apr 21 09:21:31 ptp4l[1732499.662]: master offset         37 s2 freq  -12336 
path delay      4067
Apr 21 09:21:32 ptp4l[1732500.662]: master offset          3 s2 freq  -12359 
path delay      4067
Apr 21 09:21:33 ptp4l[1732501.662]: master offset       -122 s2 freq  -12483 
path delay      4193
Apr 21 09:21:34 ptp4l[1732502.662]: master offset        119 s2 freq  -12279 
path delay      4068
Apr 21 09:21:35 ptp4l[1732503.662]: master offset        -25 s2 freq  -12387 
path delay      4110
Apr 21 09:21:36 ptp4l[1732504.662]: master offset         57 s2 freq  -12313 
path delay      4063
Apr 21 09:21:37 ptp4l[1732505.662]: master offset        -18 s2 freq  -12371 
path delay      4063
Apr 21 09:21:38 ptp4l[1732506.662]: master offset         13 s2 freq  -12345 
path delay      4068
Apr 21 09:21:39 ptp4l[1732507.662]: master offset        -76 s2 freq  -12430 
path delay      4107
Apr 21 09:21:40 ptp4l[1732508.662]: master offset        -24 s2 freq  -12401 
path delay      4107
Apr 21 09:21:41 ptp4l[1732509.662]: master offset     279231 s2 freq +266847 
path delay      4070
Apr 21 09:21:42 ptp4l[1732510.662]: master offset    -454738 s2 freq -383353 
path delay    179782
Apr 21 09:21:43 ptp4l[1732511.662]: master offset     258063 s2 freq +193027 
path delay   -162110
Apr 21 09:21:44 ptp4l[1732512.662]: master offset      52769 s2 freq  +65152 
path delay   -162110
Apr 21 09:21:45 ptp4l[1732513.662]: master offset    -221568 s2 freq -193355 
path delay     34721
Apr 21 09:21:46 ptp4l[1732514.662]: master offset      19170 s2 freq  -19087 
path delay    -25061
Apr 21 09:21:47 ptp4l[1732515.662]: master offset      25906 s2 freq   -6600 
path delay    -25061
Apr 21 09:21:48 ptp4l[1732516.662]: master offset     -10978 s2 freq  -35712 
path delay      6064
Apr 21 09:21:49 ptp4l[1732517.662]: master offset      12336 s2 freq  -15692 
path delay      6064
Apr 21 09:21:50 ptp4l[1732518.662]: master offset      18310 s2 freq   -6017 
path delay      3439
Apr 21 09:21:51 ptp4l[1732519.662]: master offset      11139 s2 freq   -7695 
path delay      4247
Apr 21 09:21:52 ptp4l[1732520.662]: master offset       5108 s2 freq  -10384 
path delay      5614
Apr 21 09:21:53 ptp4l[1732521.662]: master offset       3093 s2 freq  -10867 
path delay      5614
Apr 21 09:21:54 ptp4l[1732522.662]: master offset       2945 s2 freq  -10087 
path delay      4281
Apr 21 09:21:55 ptp4l[1732523.662]: master offset        205 s2 freq  -11943 
path delay      4700
Apr 21 09:21:56 ptp4l[1732524.662]: master offset       -212 s2 freq  -12299 
path delay      4700
Apr 21 09:21:57 ptp4l[1732525.662]: master offset        325 s2 freq  -11825 
path delay      4079
Apr 21 09:21:58 ptp4l[1732526.662]: master offset       -414 s2 freq  -12467 
path delay      4287
Apr 21 09:21:59 ptp4l[1732527.662]: master offset       -142 s2 freq  -12319 
path delay      4098
Apr 21 09:22:00 ptp4l[1732528.662]: master offset       -236 s2 freq  -12456 
path delay      4171
Apr 21 09:22:01 ptp4l[1732529.662]: master offset       -182 s2 freq  -12473 
path delay      4171
Apr 21 09:22:02 ptp4l[1732530.662]: master offset         83 s2 freq  -12262 
path delay      4028
Apr 21 09:22:03 ptp4l[1732531.662]: master offset       -113 s2 freq  -12433 
path delay      4126
Apr 21 09:22:04 ptp4l[1732532.662]: master offset         11 s2 freq  -12343 
path delay      4057
Apr 21 09:22:05 ptp4l[1732533.662]: master offset        -94 s2 freq  -12445 
path delay      4125
Apr 21 09:22:06 ptp4l[1732534.662]: master offset         73 s2 freq  -12306 
path delay      4049
Apr 21 09:22:07 ptp4l[1732535.662]: master offset        -23 s2 freq  -12380 
path delay      4077

As you see we have a “regular” path delay is around “4100” with an offset 
within ±200ns when suddenly offset jumps to "27923" for a very short amount of 
time (in fact only once) everything goes crazy.
The issue is further complicated because delay_req/resp may not be affected 
when syncs are (different queues, fabric paths etc). So with 
delay_filter_length set to 10 (default) there may be short term asymmetry 
literally for 1 packet.

Looking at ptp4l config I didn’t to find anything to overcome this situation 
and ignore this 1 bad outlier.
I implemented a quick patch 
https://gist.github.com/leoleovich/5a4dff7e089bd429c5d208d9276e1683 which can 
mitigate this and it works very well:

    May  2 14:34:26 ptp4l[2772335.049]: master offset         -9 s2 freq -10406 
path delay      3957
    May  2 14:34:27 ptp4l[2772336.049]: master offset          0 s2 freq -10399 
path delay      3957
    May  2 14:34:28 ptp4l[2772337.049]: master offset         -7 s2 freq -10406 
path delay      3957
    May  2 14:34:30 ptp4l[2772338.805]: master offset          7 s2 freq -10395 
path delay      3957
    May  2 14:34:30 ptp4l[2772339.049]: master offset         -6 s2 freq -10405 
path delay      3957
    May  2 14:34:31 ptp4l[2772340.049]: master offset        -16 s2 freq -10417 
path delay      3957
    May  2 14:34:32 ptp4l[2772341.049]: skip 1/2 large offset (>20000) 486196
    May  2 14:34:33 ptp4l[2772342.049]: master offset         26 s2 freq -10380 
path delay      3956
    May  2 14:34:34 ptp4l[2772343.049]: master offset         20 s2 freq -10378 
path delay      3956
    May  2 14:34:35 ptp4l[2772344.049]: master offset         14 s2 freq -10378 
path delay      3956
    May  2 14:34:36 ptp4l[2772345.049]: master offset        -21 s2 freq -10409 
path delay      3956
    May  2 14:34:37 ptp4l[2772346.049]: master offset          3 s2 freq -10391 
path delay      3955

Preventing unnecessary tuning of the servo for a short period of time by using 
a padding technique (simply filling with previous values). The bottom line is - 
we need to find a way to ignore outliers in a locked state where it’s not 
expected to have shot term large jumps in offset.
Please check this out and let me know if there is a better way to handle this 
situation or if this patch can inspire any other ideas…

Thank you in advance,
Oleg.
_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to