Hi team, In large distributed networks very many factors can lead to a short term spike in offset. Primarily network equipment without Transparent Clock support (even on a single device). Path delay calculations have the filtering buffer which helps to mitigate synchronous changes in path delay, however this doesn’t help if only syncs are affected for example. We often end up in a situation like this (for demonstration we set delay_filter_length = 1):
Apr 21 09:21:29 ptp4l[1732497.662]: master offset 17 s2 freq -12361 path delay 4070 Apr 21 09:21:30 ptp4l[1732498.662]: master offset 0 s2 freq -12373 path delay 4074 Apr 21 09:21:31 ptp4l[1732499.662]: master offset 37 s2 freq -12336 path delay 4067 Apr 21 09:21:32 ptp4l[1732500.662]: master offset 3 s2 freq -12359 path delay 4067 Apr 21 09:21:33 ptp4l[1732501.662]: master offset -122 s2 freq -12483 path delay 4193 Apr 21 09:21:34 ptp4l[1732502.662]: master offset 119 s2 freq -12279 path delay 4068 Apr 21 09:21:35 ptp4l[1732503.662]: master offset -25 s2 freq -12387 path delay 4110 Apr 21 09:21:36 ptp4l[1732504.662]: master offset 57 s2 freq -12313 path delay 4063 Apr 21 09:21:37 ptp4l[1732505.662]: master offset -18 s2 freq -12371 path delay 4063 Apr 21 09:21:38 ptp4l[1732506.662]: master offset 13 s2 freq -12345 path delay 4068 Apr 21 09:21:39 ptp4l[1732507.662]: master offset -76 s2 freq -12430 path delay 4107 Apr 21 09:21:40 ptp4l[1732508.662]: master offset -24 s2 freq -12401 path delay 4107 Apr 21 09:21:41 ptp4l[1732509.662]: master offset 279231 s2 freq +266847 path delay 4070 Apr 21 09:21:42 ptp4l[1732510.662]: master offset -454738 s2 freq -383353 path delay 179782 Apr 21 09:21:43 ptp4l[1732511.662]: master offset 258063 s2 freq +193027 path delay -162110 Apr 21 09:21:44 ptp4l[1732512.662]: master offset 52769 s2 freq +65152 path delay -162110 Apr 21 09:21:45 ptp4l[1732513.662]: master offset -221568 s2 freq -193355 path delay 34721 Apr 21 09:21:46 ptp4l[1732514.662]: master offset 19170 s2 freq -19087 path delay -25061 Apr 21 09:21:47 ptp4l[1732515.662]: master offset 25906 s2 freq -6600 path delay -25061 Apr 21 09:21:48 ptp4l[1732516.662]: master offset -10978 s2 freq -35712 path delay 6064 Apr 21 09:21:49 ptp4l[1732517.662]: master offset 12336 s2 freq -15692 path delay 6064 Apr 21 09:21:50 ptp4l[1732518.662]: master offset 18310 s2 freq -6017 path delay 3439 Apr 21 09:21:51 ptp4l[1732519.662]: master offset 11139 s2 freq -7695 path delay 4247 Apr 21 09:21:52 ptp4l[1732520.662]: master offset 5108 s2 freq -10384 path delay 5614 Apr 21 09:21:53 ptp4l[1732521.662]: master offset 3093 s2 freq -10867 path delay 5614 Apr 21 09:21:54 ptp4l[1732522.662]: master offset 2945 s2 freq -10087 path delay 4281 Apr 21 09:21:55 ptp4l[1732523.662]: master offset 205 s2 freq -11943 path delay 4700 Apr 21 09:21:56 ptp4l[1732524.662]: master offset -212 s2 freq -12299 path delay 4700 Apr 21 09:21:57 ptp4l[1732525.662]: master offset 325 s2 freq -11825 path delay 4079 Apr 21 09:21:58 ptp4l[1732526.662]: master offset -414 s2 freq -12467 path delay 4287 Apr 21 09:21:59 ptp4l[1732527.662]: master offset -142 s2 freq -12319 path delay 4098 Apr 21 09:22:00 ptp4l[1732528.662]: master offset -236 s2 freq -12456 path delay 4171 Apr 21 09:22:01 ptp4l[1732529.662]: master offset -182 s2 freq -12473 path delay 4171 Apr 21 09:22:02 ptp4l[1732530.662]: master offset 83 s2 freq -12262 path delay 4028 Apr 21 09:22:03 ptp4l[1732531.662]: master offset -113 s2 freq -12433 path delay 4126 Apr 21 09:22:04 ptp4l[1732532.662]: master offset 11 s2 freq -12343 path delay 4057 Apr 21 09:22:05 ptp4l[1732533.662]: master offset -94 s2 freq -12445 path delay 4125 Apr 21 09:22:06 ptp4l[1732534.662]: master offset 73 s2 freq -12306 path delay 4049 Apr 21 09:22:07 ptp4l[1732535.662]: master offset -23 s2 freq -12380 path delay 4077 As you see we have a “regular” path delay is around “4100” with an offset within ±200ns when suddenly offset jumps to "27923" for a very short amount of time (in fact only once) everything goes crazy. The issue is further complicated because delay_req/resp may not be affected when syncs are (different queues, fabric paths etc). So with delay_filter_length set to 10 (default) there may be short term asymmetry literally for 1 packet. Looking at ptp4l config I didn’t to find anything to overcome this situation and ignore this 1 bad outlier. I implemented a quick patch https://gist.github.com/leoleovich/5a4dff7e089bd429c5d208d9276e1683 which can mitigate this and it works very well: May 2 14:34:26 ptp4l[2772335.049]: master offset -9 s2 freq -10406 path delay 3957 May 2 14:34:27 ptp4l[2772336.049]: master offset 0 s2 freq -10399 path delay 3957 May 2 14:34:28 ptp4l[2772337.049]: master offset -7 s2 freq -10406 path delay 3957 May 2 14:34:30 ptp4l[2772338.805]: master offset 7 s2 freq -10395 path delay 3957 May 2 14:34:30 ptp4l[2772339.049]: master offset -6 s2 freq -10405 path delay 3957 May 2 14:34:31 ptp4l[2772340.049]: master offset -16 s2 freq -10417 path delay 3957 May 2 14:34:32 ptp4l[2772341.049]: skip 1/2 large offset (>20000) 486196 May 2 14:34:33 ptp4l[2772342.049]: master offset 26 s2 freq -10380 path delay 3956 May 2 14:34:34 ptp4l[2772343.049]: master offset 20 s2 freq -10378 path delay 3956 May 2 14:34:35 ptp4l[2772344.049]: master offset 14 s2 freq -10378 path delay 3956 May 2 14:34:36 ptp4l[2772345.049]: master offset -21 s2 freq -10409 path delay 3956 May 2 14:34:37 ptp4l[2772346.049]: master offset 3 s2 freq -10391 path delay 3955 Preventing unnecessary tuning of the servo for a short period of time by using a padding technique (simply filling with previous values). The bottom line is - we need to find a way to ignore outliers in a locked state where it’s not expected to have shot term large jumps in offset. Please check this out and let me know if there is a better way to handle this situation or if this patch can inspire any other ideas… Thank you in advance, Oleg.
_______________________________________________ Linuxptp-users mailing list Linuxptp-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-users