ntpd "STEP storm" with small offset

Ward WIllats Tue, 24 Jul 2018 16:02:17 -0700

Hello.

I've got several thousand boxes in the field running BusyBox 1.27.2 on an 
OpenWRT-based MIPS-32 platform.


On a small number of these machines ( maybe < 10 ) our busybox ntpd hotplug 
script sometimes gets fired frequently with STEP events that contain a small 
(usually negative) time offset. So:

+---------------------+-------------------------------------------------------------------------------------------+
| eventTime(PST)      | eventData                                               
                                  |
+---------------------+-------------------------------------------------------------------------------------------+
| 2018-07-24 02:51:51 | {"message":"NTPD: STEP event: stratum=16 ppm=25 
poll_interval=512 offset=0.170130 secs"}  |
| 2018-07-24 03:36:14 | {"message":"NTPD: STEP event: stratum=16 ppm=20 
poll_interval=512 offset=-0.149882 secs"} |
| 2018-07-24 03:41:39 | {"message":"NTPD: STEP event: stratum=16 ppm=20 
poll_interval=512 offset=0.140356 secs"}  |
| 2018-07-24 03:49:16 | {"message":"NTPD: STEP event: stratum=16 ppm=20 
poll_interval=512 offset=-0.172208 secs"} |
| 2018-07-24 06:23:12 | {"message":"NTPD: STEP event: stratum=16 ppm=19 
poll_interval=512 offset=0.163183 secs"}  |
| 2018-07-24 06:32:09 | {"message":"NTPD: STEP event: stratum=16 ppm=19 
poll_interval=512 offset=-0.152102 secs"} |
| 2018-07-24 07:01:07 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=0.258099 secs"}  |
| 2018-07-24 07:06:01 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=0.144177 secs"}  |
| 2018-07-24 07:14:59 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=-0.335456 secs"} |
| 2018-07-24 07:41:18 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=0.381623 secs"}  |
| 2018-07-24 07:50:25 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=-0.290886 secs"} |
| 2018-07-24 07:52:59 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=-0.308475 secs"} |
| 2018-07-24 08:38:04 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=0.250189 secs"}  |
| 2018-07-24 08:46:49 | {"message":"NTPD: STEP event: stratum=16 ppm=18 
poll_interval=512 offset=-0.199948 secs"} |
| 2018-07-24 10:24:50 | {"message":"NTPD: STEP event: stratum=16 ppm=15 
poll_interval=512 offset=-0.178944 secs"} |
+---------------------+-------------------------------------------------------------------------------------------+

The WiFi connection to the Internet on the affected units don't seem to be 
particularly bad (RSSI and I/O error-wise), but...who knows?
We also see STRATUM events with very small offsets, as expected.
This is a relatively recent behavior (last few weeks) and not a lot has changed 
on our end, so wondering if the time server(s) selected might have something to 
do with this?

Reading the comments in ntpd.c, it seems like this script should be only be 
called when abs(offset) > 1.0 (STEP_THRESHOLD ?) 
(We are, in fact, counting on conservative behavior here, since a STEP event 
causes us to reset all of our internal timekeeping and we lose a small amount 
of data when this occurs.)

We can patch around this by squelching any action on our part if the abs(step) 
offset < 1, but I was wondering if:

1. You folks can glean any useful clues as to causality from the above data, or
2. This is a bug you've have seen before (and maybe fixed in the last year)?, or
3. My understanding of what STEP is supposed to mean (gross time jump) is 
imperfect, or
4. You have any other comments or suggestions for my idiot self!

Thanks,

-- Ward
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

ntpd "STEP storm" with small offset

Reply via email to