> On Dec 05, 2016, at 01:05, Miroslav Lichvar <mlich...@redhat.com> wrote:
> If I understand your change correctly, it increases the minimum
> acceptable delay, which increases the number of combined readings and
> makes the offset more stable. I'm worried on a busy system bad
> readings could be included and disrupt the mean value.
On a busier systems, I was actually seeing more error than on quiet systems.
This brings the noise and spike level on the busy system down to the same level
as the quiet system.
The change has a couple of things it does. First, it ignore bad slots. On all
the systems I’ve tested, slot one is always the worst slot, which long delays,
resulting in a baseline skew. The second is avoiding the SysPrecision variable
as a gate. Using this results in a tendency to select the wider slots for
averaging, while ignoring the consistent slots. In my testing on busy systems,
it discarded many slots, sometimes to the point of all but two slots. This is
obviously bad for averaging. My first attempt was simply to remove the
averaging and use the best slot. This provided an improvement on the noise over
the prior approach, but using the slots within 10% is smoother.
FWIW, I haven’t dug into how SysPrecision is calculated, but in looking at
several systems it appears to be inconsistent on identical hardware.
> Note that if you save the offset between PHC and system clock to a
> double, you will lose precision when the offset is large as double has
> a 53-bit precision.
I agree. I think all of this should be done using 64 bit integers, but the code
that was there was using floating point so I stayed with that to fit in. Use of
doubles seems rather pervasive throughout chrony. I was very surprised by this.
I think it makes sense to use integer math for as much as possible. At least
for anything to do with time intervals. But it seemed a bit much to rewrite
everything to address this one issue. :)