On Tue, Apr 25, 2017 at 4:11 AM, Miroslav Lichvar <mlich...@redhat.com> wrote: > On Mon, Apr 24, 2017 at 03:54:25PM -0400, Chris Perl wrote: >> 1. If there is asymmetry, its unlikely it is constant for the entire >> life of the chrony process, assuming you're running chrony for a >> reasonable period and have a reasonably designed network and your time >> sources are located reasonably close ("reasonable" can obviously be >> different for different people). > > A major source of constant asymmetry is the timestamping. Unless you > are using HW timestamping, there can easily be an asymmetry of few > tens of microseconds due to interrupt coalescing and other delays in > the kernel, driver and HW. If both ends had the same asymmetry, it > would cancel out, but at least in my experience that's unusual. Even > if both machines had the same HW and SW, there may be a difference due > to the timing of the packets (i.e. server sends a response > immediatelly after receiving a request). > > For example, here is a client with an Intel i210 card running two > chronyd instances using the same server. One is using HW timestamping > and controlling the clock, the other is using SW timestamping and just > monitoring the server with the noselect option. > > # chronyc -h 127.0.0.1 -p 323 -m sources sourcestats > 210 Number of sources = 1 > MS Name/IP address Stratum Poll Reach LastRx Last sample > =============================================================================== > ^* ntp1.local 1 0 377 1 +9ns[ +38ns] +/- > 26us > 210 Number of sources = 1 > Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev > ============================================================================== > ntp1.local 8 5 9 +0.000 0.018 +0ns 31ns > > # chronyc -h 127.0.0.1 -p 324 -m sources sourcestats > 210 Number of sources = 1 > MS Name/IP address Stratum Poll Reach LastRx Last sample > =============================================================================== > ^? ntp1.local 1 0 377 4 +9740ns[+9740ns] +/- > 56us > 210 Number of sources = 1 > Name/IP Address NP NR Span Frequency Freq Skew Offset Std Dev > ============================================================================== > ntp1.local 24 14 354 -0.000 0.001 -8542ns 201ns > > While the second instance seem to be stable to few hundreds > nanoseconds, the measured offset has an error of about 9 microseconds. > The extra delay of about 40 microseconds is largely asymmetric. > > Don't forget to use the xleave option on your clients if their servers > are running chrony. Even with SW timestamping it can help a lot.
I tried to setup a similar experiment myself, but am seeing some weird behavior I don't currently understand. I am running chrony as of git commit 9d9107dcdb7768a03dc129d33b2a7a25f1eea2f5620bc85eb00cfea07c1b6075 with your chrony-timestamping.patch from the copr repo applied (so I can use hardware timestamping on CentOS 7.3). My server is sync'd to a GPS appliance via PTP with linuxptp using an X540 interface (this is running in a network namespace to hide it from chrony, re the requirement of only having one interface that supports hardware timestamping when running on CentOS 7.3). I have chrony configured using a PHC reflock and serving time via an I350 interface and configured to use hardware timestamping. My client is configured to talk to this server via its own I350 interface, set to use hardware timestamping and using the xleave option. When running like this, I'm seeing offsets of around hundreds of nanos, root dispersion of about 17us and root delay of about 14us. The weird part happens when I try to run a second instance of chrony using kernel timestamps to compare against the first with hardware timestamps. The second instance of chrony is configured to use different paths for everything, listens on a different command port and is not setup to act as a server (i.e. it has no `allow' directive). Or, at least I believe it is, its possible I've missed something. When I run the second instance of chrony, I see the root delay for the first instance jump from a very consistent 14us to about 30us (the 30us is pretty consistent with another machine where I'm running a client using kernel timestamps only). I'm observing this by running `chronyc tracking' in a loop every second. Further digging reveals that the increase in the root delay is due to an increase in the peer delay (observed by running `chronyc ntpdata' every 1s). I have tried varying the `minpoll' and `maxpoll' on the second instance and have observed that the jump in the peer delay on the first instance corresponds with the interval at which the second instance of chrony is polling (e.g. if I set the second instance to poll every 16s, the jump only happens about once every 16s). Further, looking at the `measurements.log' produced by the first instance, I see that when the jump occurs, it looks like chrony received a normal ntp packet (in this case two, I guess), not an interleaved one: 2017-07-26 16:09:40 192.168.1.100 N 1 111 111 1111 0 0 1.00 -1.700e-08 1.458e-05 1.998e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:41 192.168.1.100 N 1 111 111 1111 0 0 1.00 3.000e-09 1.456e-05 1.992e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:42 192.168.1.100 N 1 111 111 1111 0 0 1.00 -3.000e-09 1.457e-05 1.993e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:43 192.168.1.100 N 1 111 111 1111 0 0 1.00 -4.300e-08 1.461e-05 1.993e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:44 192.168.1.100 N 1 111 111 1111 0 0 1.00 1.800e-08 1.479e-05 1.994e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:45 192.168.1.100 N 1 111 111 1111 0 0 1.00 5.500e-08 1.447e-05 2.004e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:46 192.168.1.100 N 1 111 111 1111 0 0 1.00 -8.000e-09 1.449e-05 2.004e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:47 192.168.1.100 N 1 111 111 1111 0 0 1.00 5.600e-08 1.461e-05 1.995e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:49 192.168.1.100 N 1 111 111 1101 0 0 1.00 -9.085e-06 3.275e-05 1.988e-06 0.000e+00 1.526e-05 50545030 4B H H 2017-07-26 16:09:50 192.168.1.100 N 1 111 111 1101 0 0 1.00 -9.043e-06 3.262e-05 1.987e-06 0.000e+00 1.526e-05 50545030 4B H H 2017-07-26 16:09:50 192.168.1.100 N 1 111 111 1111 0 0 1.00 4.300e-08 1.463e-05 1.995e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:51 192.168.1.100 N 1 111 111 1111 0 0 1.00 3.600e-08 1.461e-05 1.995e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:52 192.168.1.100 N 1 111 111 1111 0 0 1.00 1.000e-08 1.451e-05 1.990e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:53 192.168.1.100 N 1 111 111 1111 0 0 1.00 3.600e-08 1.454e-05 2.009e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:54 192.168.1.100 N 1 111 111 1111 0 0 1.00 3.000e-09 1.453e-05 2.009e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:55 192.168.1.100 N 1 111 111 1111 0 0 1.00 -6.800e-08 1.460e-05 2.004e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:56 192.168.1.100 N 1 111 111 1111 0 0 1.00 3.500e-08 1.461e-05 1.992e-06 0.000e+00 1.526e-05 50545030 4I H H 2017-07-26 16:09:57 192.168.1.100 N 1 111 111 1111 0 0 1.00 -5.000e-09 1.444e-05 1.996e-06 0.000e+00 1.526e-05 50545030 4I H H Fwiw, I also have the `xleave' option specified for the second instance. So, it certainly seems like the second instance is interfering with the first in some way. Any thoughts about why this might be happening or where I should focus my debugging efforts? -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.