iYes, the key difference between chrony and ntpd is that chrony does a linear
regression on the last n samples to estimate the frequency and the offset now.
It figures out how many n to keep by looking at the number of consecutive
samples which are above or below the regression line. If there are too many
that suggests that the curve is not being well fitted by a linear regression,
and the number of n used is decreased until the consecutive test is passed.
The number is then increased by one at time until the the test begins to be
failed again. I belive the min samples and max samples tell what the maximum
number of consecutive good sample are used, and the minimum number that are
retained. The default at least used to 3 for minsamples (so a linear curve
with at least some estimate of the errors can be fit) The max used to be 64
but these are now configurable if you know what you are doing.

These attributes of chrony allow it to make a much better estimate of the
current offset and drift than does ntpd

Note that every time the rate of the clock is changed, all of the samples are
also changed to reflect that change in rate. Or if the clock offset it jumped,
all the retained samples are changed to reflect that jump. Otherwise the
fitting would get all messed up.


If the noise is dominated by for example, poisson noise process, your
estimator might be of advantage (given the cost that you state). but in ntp
case it is a mixture of poisson and gaussian. In most situations the gaussian
probably dominates. In some cases it does not, where a different analysis
technique might be better. But I think you really would have to run simulation
experiments both withsimulated data where some noise statistics is chosen, and
with real data to see how much difference it makes. One also has to be worried
about potential instabilities in the analysis one performs. One way of handling outliers is to simply throw them away. Eg if a data point
is 5sigma away from the best fit curve, one could simply eliminate it, and try
again. This is what is done when the data is accumulated and only the median
of the passed on to chrony. Typically only the 60 or 70% of the data points that lie
closest together are used. This is to get rid of what David Mills called
popcorn noise.

The problem is that there really are no great models for the noise, and
besides, almost every implimentation is faced with different noise sources. Also, the more complex one makes the analysis, the higher the probability that
subtle (or not so subtle) bugs creep in, obviating all of the work.





Willia G. Unruh __| Canadian Institute for|____ Tel: +1(604)822-3273
Physics&Astronomy _|___ Advanced Research _|____ Fax: +1(604)822-5324
UBC, Vancouver,BC _|_ Program in Cosmology |____ un...@physics.ubc.ca
Canada V6T 1Z1 ____|____ and Gravity ______|_ www.theory.physics.ubc.ca/

On Wed, 17 Feb 2021, Charlie Laub wrote:


While I was reading the docs I came across these parameters:

maxsamples [samples]

    The maxsamples directive sets the default maximum number of samples that 
chronyd should keep for each source. This setting can be overridden for 
individual sources in the
server and refclock directives. The default value is 0, which disables the 
configurable limit. The useful range is 4 to 64.

    As a special case, setting maxsamples to 1 disables frequency tracking in 
order to make the sources immediately selectable with only one sample. This can 
be useful when
chronyd is started with the -q or -Q option.

 

minsamples [samples]

    The minsamples directive sets the default minimum number of samples that 
chronyd should keep for each source. This setting can be overridden for 
individual sources in the
server and refclock directives. The default value is 6. The useful range is 4 
to 64.

    Forcing chronyd to keep more samples than it would normally keep reduces 
noise in the estimated frequency and offset, but slows down the response to 
changes in the frequency
and offset of the clock. The offsets in the tracking and sourcestats reports 
(and the tracking.log and statistics.log files) may be smaller than the actual 
offsets.

 

Maybe I am way off here, but the descriptions suggest that these retained 
samples are interpolated using a linear or other form, and then the 
interpolated info is used by
chrony. Is that correct?

 

The offset data is obviously noisy. In addition I have observed on my own 
machines that there can be occasional outliers that are on the order of 10x 
larger than usual. So the
data also has outliers.

 

A linear regression is not the best way to process this kind of data. Instead a 
robust analysis method is best. There is a simple and effective one for 
obtaining the “best fit”
slope of a dataset called a Thiel-Sen estimator. There is a great Wikipedia 
entry for it if you are not familiar with the technique (not sure if links are 
allowed so I did not
include it). In a nutshell, the slope for all pairs of points in the dataset is 
computed and the median value is selected as the estimate of the slope. It is 
straightforward to
use this to obtain an good estimate of the true offset for any time within the 
time interval of the dataset, and to make a prediction into the future. Because 
it can reject
outliers and fits noisy data well, it seems like it would be a perfect 
candidate for a more robust offset estimator in chrony.

 

Normally this is termed an order N^2 difficult problem, because the slope must 
be calculated for all pairs in the dataset. But to implement this in chrony it 
seems to me you
only need to compute N pairs as each new offset is obtained. This is because 
the previous pairwise slope values will not change, and it is only the pairwise 
slope between the
single new offset value and the existing, retained values that needs to be 
calculated. So the overhead would not be large, especially since the number of 
data points is less
than e.g. 64.

 

Would it be worth looking into implementing this estimation method in chrony 
for predicting the current and future offsets?

 

 

-Charlie


Reply via email to