Re: [Bloat] Graph of bloat

Hal Murray Thu, 09 Jul 2015 03:08:47 -0700

There are several parts to this discussion.

Leap seconds are ugly.  The basic problem is that POSIX pretends they don't 
exist.  That's a carryover from the early days when computer time keeping 
didn't have to worry about them.  They weren't introduced until 1972.  There 
should be a second labeled 23:59:60 but most systems just set the clock back 
a second and repeat 23:59:59, and all sorts of systems get in trouble when 
time goes backwards.


They don't impact daily life like leap years do, so we don't teach kids about 
them when they learn about leap years.  Most people don't even know they exist, 
and that includes most programmers.  An additional complication is that they 
are unpredictable so you can't wire simple conversions into a chunk of code 
that gets copied around.

Google decided that it was simpler to "smear" their clocks rather than chase 
down and fix the bugs in all their code.
  Time, technology and leaping seconds 
  
http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html
The downside is that all their clocks are off by up to 1/2 second.  If you 
don't need accurate time for legal reasons like stock market trading, their 
approach is probably a good one.  Their internal clocks will all agree with 
each other, but they won't agree with outside systems that aren't playing the 
smearing game.

The blog above describes the smear using cosine - no sharp corners.  The graph 
shows a linear smear.


> Does ntp adjust system time backward based on getting nearly all it's
> samples with well over a 1/2 second of induced delay? 

The idea with smearing is to avoid having to set the clock back.  The reference 
time on that graph is UTC.  If your server was using only Google's NTP servers, 
it would follow that ramp, inserting the leap second over 20 hours rather than 
all at once by setting the clock back.  That's the whole point of the smear.  
You lie to all your NTP clients and they all follow the same lie.

All that has nothing to do with bloat.  It's just background for why I was 
making the graph.

--------

Now for NTP...

After the typical NTP client-server exchange, the client has 4 time stamps, 
send and receive for packets going in both directions.  If you look at things 
in the right way, you have N equations and N+1 unknowns.  You need one more 
equation to sort things out.

If you assume that the clocks on both ends are accurate, you can compute the 
network transit times in both directions.

NTP makes the assumption that the network delays are symmetric.  Without bloat, 
that's generally reasonable.  It does screwup on long links with asymmetric 
routing.  If you watch NTP servers over a long distance, you can see steps when 
the routing changes.  On the scale of bloat, those errors are minor.  If you 
had a fast link rather than my slow DSL link they would be significant.

ntpd remembers the last 8 samples to each server.  It only uses the one with 
the lowest round trip time, assuming that the others hit some sort of queueing 
delay.  That filters out occasional bursts of interference or even bloat.  It 
doesn't work for sustained bloat.

The huff-n-puff filter can be used for sustained bloat - better to coast than 
get confused.  But there needs to be some limit on how long to wait before 
assuming the current timings are valid because the network has been 
reconfigured.  If your bloat lasts long enough, ntpd will get confused.


In addition to getting the time correct, ntpd is also trying to calibrate the 
clock frequency so the future time will be more accurate (if the current time 
is good).  That's the "drift".  Without that correction, the clock will drift 
farther from the true time the longer you wait.

Ballpark numbers for the errors in crystals are 10s of PPM (parts per million). 
 One PPM is roughly a second over 2 weeks, so an uncorrected clock is likely to 
drift seconds per day.  I have one system that's off by 138 PPM.  (The drift 
can also correct for minor errors in software.)

Normally, ntpd is just making minor corrections.  It does that by slewing the 
clock, that is by fudging the clock frequency so the clock will "drift" in the 
desired direction.  That takes a long time to make large corrections.  ntpd 
will normally step the clock if the correction is over 128 ms.

But stepping the clock backwards is what causes most of the problems.  ntpd has 
command line switches to don't-do-that, and another to allow one step at 
startup time...  There are no simple answers.

--------

> Judging from that graphic... I don't think huff and puff was designed for
> the bufferbloated era! so the question remains, in hal's tests, did ntp
> adjust the clock backwards? 

No.  The system that collected that data was getting time from a good local GPS 
clock.  It helps to have a place to stand if you want to collect time data.

Here is a typical pattern from a system using the pool without any huff-n-puff 
while I did a big download.
 8 Jul 22:02:17 ntpd[26705]: 0.0.0.0 061c 0c clock_step -0.259747 s
 8 Jul 23:06:24 ntpd[26705]: 0.0.0.0 061c 0c clock_step +0.274448 s


-- 
These are my opinions.  I hate spam.



_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Graph of bloat

Reply via email to