On Tue, Apr 25, 2017 at 4:11 AM, Miroslav Lichvar <mlich...@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 03:54:25PM -0400, Chris Perl wrote:
>> 1.  If there is asymmetry, its unlikely it is constant for the entire
>> life of the chrony process, assuming you're running chrony for a
>> reasonable period and have a reasonably designed network and your time
>> sources are located reasonably close ("reasonable" can obviously be
>> different for different people).
>
> A major source of constant asymmetry is the timestamping. Unless you
> are using HW timestamping, there can easily be an asymmetry of few
> tens of microseconds due to interrupt coalescing and other delays in
> the kernel, driver and HW. If both ends had the same asymmetry, it
> would cancel out, but at least in my experience that's unusual. Even
> if both machines had the same HW and SW, there may be a difference due
> to the timing of the packets (i.e. server sends a response
> immediatelly after receiving a request).
>
> For example, here is a client with an Intel i210 card running two
> chronyd instances using the same server. One is using HW timestamping
> and controlling the clock, the other is using SW timestamping and just
> monitoring the server with the noselect option.
>
> # chronyc -h 127.0.0.1 -p 323 -m sources sourcestats
> 210 Number of sources = 1
> MS Name/IP address         Stratum Poll Reach LastRx Last sample
> ===============================================================================
> ^* ntp1.local                    1   0   377     1     +9ns[  +38ns] +/-   
> 26us
> 210 Number of sources = 1
> Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
> ==============================================================================
> ntp1.local                  8   5     9     +0.000      0.018     +0ns    31ns
>
> # chronyc -h 127.0.0.1 -p 324 -m sources sourcestats
> 210 Number of sources = 1
> MS Name/IP address         Stratum Poll Reach LastRx Last sample
> ===============================================================================
> ^? ntp1.local                    1   0   377     4  +9740ns[+9740ns] +/-   
> 56us
> 210 Number of sources = 1
> Name/IP Address            NP  NR  Span  Frequency  Freq Skew  Offset  Std Dev
> ==============================================================================
> ntp1.local                 24  14   354     -0.000      0.001  -8542ns   201ns
>
> While the second instance seem to be stable to few hundreds
> nanoseconds, the measured offset has an error of about 9 microseconds.
> The extra delay of about 40 microseconds is largely asymmetric.
>
> Don't forget to use the xleave option on your clients if their servers
> are running chrony. Even with SW timestamping it can help a lot.

I tried to setup a similar experiment myself, but am seeing some weird
behavior I don't currently understand.

I am running chrony as of git commit
9d9107dcdb7768a03dc129d33b2a7a25f1eea2f5620bc85eb00cfea07c1b6075 with
your chrony-timestamping.patch from the copr repo applied (so I can
use hardware timestamping on CentOS 7.3).

My server is sync'd to a GPS appliance via PTP with linuxptp using an
X540 interface (this is running in a network namespace to hide it from
chrony, re the requirement of only having one interface that supports
hardware timestamping when running on CentOS 7.3).

I have chrony configured using a PHC reflock and serving time via an
I350 interface and configured to use hardware timestamping.

My client is configured to talk to this server via its own I350
interface, set to use hardware timestamping and using the xleave
option.

When running like this, I'm seeing offsets of around hundreds of
nanos, root dispersion of about 17us and root delay of about 14us.

The weird part happens when I try to run a second instance of chrony
using kernel timestamps to compare against the first with hardware
timestamps.

The second instance of chrony is configured to use different paths for
everything, listens on a different command port and is not setup to
act as a server (i.e. it has no `allow' directive).  Or, at least I
believe it is, its possible I've missed something.

When I run the second instance of chrony, I see the root delay for the
first instance jump from a very consistent 14us to about 30us (the
30us is pretty consistent with another machine where I'm running a
client using kernel timestamps only).  I'm observing this by running
`chronyc tracking' in a loop every second.  Further digging reveals
that the increase in the root delay is due to an increase in the peer
delay (observed by running `chronyc ntpdata' every 1s).

I have tried varying the `minpoll' and `maxpoll' on the second
instance and have observed that the jump in the peer delay on the
first instance corresponds with the interval at which the second
instance of chrony is polling (e.g. if I set the second instance to
poll every 16s, the jump only happens about once every 16s).

Further, looking at the `measurements.log' produced by the first
instance, I see that when the jump occurs, it looks like chrony
received a normal ntp packet (in this case two, I guess), not an
interleaved one:

2017-07-26 16:09:40 192.168.1.100      N  1 111 111 1111   0  0 1.00
-1.700e-08  1.458e-05  1.998e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:41 192.168.1.100      N  1 111 111 1111   0  0 1.00
3.000e-09  1.456e-05  1.992e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:42 192.168.1.100      N  1 111 111 1111   0  0 1.00
-3.000e-09  1.457e-05  1.993e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:43 192.168.1.100      N  1 111 111 1111   0  0 1.00
-4.300e-08  1.461e-05  1.993e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:44 192.168.1.100      N  1 111 111 1111   0  0 1.00
1.800e-08  1.479e-05  1.994e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:45 192.168.1.100      N  1 111 111 1111   0  0 1.00
5.500e-08  1.447e-05  2.004e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:46 192.168.1.100      N  1 111 111 1111   0  0 1.00
-8.000e-09  1.449e-05  2.004e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:47 192.168.1.100      N  1 111 111 1111   0  0 1.00
5.600e-08  1.461e-05  1.995e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:49 192.168.1.100      N  1 111 111 1101   0  0 1.00
-9.085e-06  3.275e-05  1.988e-06  0.000e+00  1.526e-05 50545030 4B H H
2017-07-26 16:09:50 192.168.1.100      N  1 111 111 1101   0  0 1.00
-9.043e-06  3.262e-05  1.987e-06  0.000e+00  1.526e-05 50545030 4B H H
2017-07-26 16:09:50 192.168.1.100      N  1 111 111 1111   0  0 1.00
4.300e-08  1.463e-05  1.995e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:51 192.168.1.100      N  1 111 111 1111   0  0 1.00
3.600e-08  1.461e-05  1.995e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:52 192.168.1.100      N  1 111 111 1111   0  0 1.00
1.000e-08  1.451e-05  1.990e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:53 192.168.1.100      N  1 111 111 1111   0  0 1.00
3.600e-08  1.454e-05  2.009e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:54 192.168.1.100      N  1 111 111 1111   0  0 1.00
3.000e-09  1.453e-05  2.009e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:55 192.168.1.100      N  1 111 111 1111   0  0 1.00
-6.800e-08  1.460e-05  2.004e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:56 192.168.1.100      N  1 111 111 1111   0  0 1.00
3.500e-08  1.461e-05  1.992e-06  0.000e+00  1.526e-05 50545030 4I H H
2017-07-26 16:09:57 192.168.1.100      N  1 111 111 1111   0  0 1.00
-5.000e-09  1.444e-05  1.996e-06  0.000e+00  1.526e-05 50545030 4I H H

Fwiw, I also have the `xleave' option specified for the second instance.

So, it certainly seems like the second instance is interfering with
the first in some way.

Any thoughts about why this might be happening or where I should focus
my debugging efforts?

-- 
To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org 
with "unsubscribe" in the subject.
For help email chrony-users-requ...@chrony.tuxfamily.org 
with "help" in the subject.
Trouble?  Email listmas...@chrony.tuxfamily.org.

Reply via email to