Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread Unruh
[EMAIL PROTECTED] (Danny Mayer) writes:

David L. Mills wrote:
 Danny,
 
 It doesn't stop working; it just clamps whatever it gets to +-500 PPM as 
 appropriate. If the intrinsic error is greater than 500 PPM, the loop 
 will do what it can with the residual it can't correct showing as a 
 systematic time ofset.
 
 Dave
 

I didn't mean to suggest that ntpd stopped running. It was that the 
clock was drifting steadily off into the sunset. I realize that if the 
problem corrected itself ntpd would bring things back to normal.

But that suggests that the drift rate of your chip became bigger than
500PPM, which is huge. Maybe something altered the tick size
inappropriately. ntp should have hauled the offset back to zero -- just
taking a longer time ( 100msec at 500PPM takes about 200 sec to eliminate--
which is not that long.)



Danny

 Danny Mayer wrote:
 David L. Mills wrote:

 Danny,

 Unless the computer clock intrinsic frequency error is huge, the only 
 time the 500-PPM kicks in is with a 100-ms step transient and poll 
 interval 16 s. The loop still works if it hits the stops; it just can't 
 drive the offset to zero.

 Dave

 Yes, I found this out when my laptop stopped disciplined the clock and 
 was complaining about the frequency limits and I started digging into 
 the code to figure out why.

 Danny
 
 ___
 questions mailing list
 questions@lists.ntp.org
 https://lists.ntp.org/mailman/listinfo/questions
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread Unruh
[EMAIL PROTECTED] (David Malone) writes:

Unruh [EMAIL PROTECTED] writes:

weekends. Lots of power at 10^-5 Hz and harmonics, and .7 10^-8Hz.-- more
than would be predicted by 1/f

10^-5Hz is about once per day. I'm not sure what .7 10^8Hz is - it
seems to be about once every 4.5 years? I would have assumed you'd
get power around 10^-5Hz (daily), 10^-6 Hz (weekly) and maybe 3x10^-8
(yearly) based on a mix of enviromental factors (air conditioning/heating)
and usage?

Yes, that was supposed to  be 1/week. 


   David.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] strange behaviour of ntp peerstats entries.

2008-01-28 Thread Unruh
David L. Mills [EMAIL PROTECTED] writes:

Danny,

True; there is an old RFC or IEN that reports the results with varying 
numbers of clock filter stages, from which the number eight was the 
best. Keep in mind these experiments were long ago and with, as I 
remember, ARPAnet sources. The choice might be different today, but 
probably would not result in great improvment in the general cases. Note 
however that the popcorn spike supressor is a very real Internet add-on.

Oh yes. popcorn suppression is important. I agree. But the filter goes well
beyond that. My eaction is that on the one hand people keep saying how
important net load is, and that one does not want to use poll intervals
that are much smaller than 8 or 10, and on the other hand, throwing away
80-90% of the data collected. Remin ds me of the story of Saul, king of the
Israelites, whose army was besieged, and he mentioned that he was thirsty.
A few of his soldiers risked everything to get through the enemy lines and
bring him water. He was so impressed that he poured it all out on the
ground, in tribute to their courage. I have always found that story an
incredible insult to the bravery instead.

The procedure does drastically reduce the variance of the delay, but does
not much for the variance of the offset, which is of coure what is
important. Just to bring up chrony again, it uses both a suppression where
round trips greater than say 1.5 of min are discarded, and data is weighted
by some power of the invere of the delay.


The number of stages  may have unforseen consequences. The filter can 
(and often does) introduce additional delay in the feedback loop. The 
loop time constant takes this into account so the impulse response is 
only marginaly affected. So, the loop is really engineered for good 
response with one accepted sample in eight. Audio buffs will recognize 
any additional aamples only improve the response, since they amount to 
oversampling the signal. Audio buffs will also recognize the need for 
zeal in avoiding undersampling, which is why the poll-adjust algorithm 
is so squirrely.

Dave

Danny Mayer wrote:

 Unruh wrote:
 
[EMAIL PROTECTED] (Danny Mayer) writes:


Unruh wrote:

Brian Utterback [EMAIL PROTECTED] writes:


Unruh wrote:

David L. Mills [EMAIL PROTECTED] writes:

You might not have noticed a couple of crucial issues in the clock 
filter code.

I did notice them all. Thus my caveate. However throwing away 80% of the
precious data you have seems excessive.

Note that the situation can arise that the one can wait many more than 8
samples for another one. Say sample i is a good one. and remains the best
for the next 7 tries. Sample i+7 is slightly worse than sample i and thus
it is not picked as it comes in. But the next i samples are all worse than
it. Thus it remains the filtered one, but is never used because it was not
the best when it came in. This situation could keep going for a long time,
meaning that ntp suddenly has no data to do anything with for many many
poll intervals. Surely using sample i+7 is far better than  not using any
data for that length of time.

On the contrary, it's better not to use the data at all if its suspect. 
ntpd is designed to continue to work well even in the event of loosing 
all access to external sources for extended periods.

And this could happen again. Now, since the
delays are presumably random variables, the chances of this happening are
not great ( although under a condition of gradually worsening network the
chances are not that small), but since one is running ntp for millions or
billions of samples, the chances of this happening sometime becomes large. 


There are quite a few ntpd servers which are isolated and once an hour 
use ACTS to fetch good time samples. This is not rare at all.

And then promplty throw them away because they do not satify the minimum
condition? No, it is not best to throw away data no matter how suspect.
Data is a preecious comodity and should be thrown away only if you are damn
sure it cannot help you. For example lets say that the change in delay is
.1 of the variance of the clock. The max extra noise that delay can cause
is about .01 Yet NTP will chuck it. Now if the delay is 100 times the
variance, sure chuck it. It probably cannot help you. The delay is a random
process, non-gaussian admitedly, and its effect on the time is also a
random process-- usually much closer to gaussian. And why was the figure of
8 chosen ( the best of the last 8 tries) why not 1? or 3? I suspect it
came off the top of someone's head-- lets not throuw away too much stuff,
since it would make ntp unseable, but lets throw away some to feel
virtuous. Sorry for being sarcastic, but I would really like to know what
the justification was for throwing so much data away.
 
 
 No, 8 was chosen after a lot of experimentation to ensure the best 
 results over a wide range of configurations. Dave has adjusted these 
 numbers over the years and he's the person to ask.
 
 

Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread Danny Mayer
Unruh wrote:
 [EMAIL PROTECTED] (Danny Mayer) writes:
 
 David L. Mills wrote:
 Danny,

 It doesn't stop working; it just clamps whatever it gets to +-500 PPM as 
 appropriate. If the intrinsic error is greater than 500 PPM, the loop 
 will do what it can with the residual it can't correct showing as a 
 systematic time ofset.

 Dave

 
 I didn't mean to suggest that ntpd stopped running. It was that the 
 clock was drifting steadily off into the sunset. I realize that if the 
 problem corrected itself ntpd would bring things back to normal.
 
 But that suggests that the drift rate of your chip became bigger than
 500PPM, which is huge. Maybe something altered the tick size
 inappropriately. ntp should have hauled the offset back to zero -- just
 taking a longer time ( 100msec at 500PPM takes about 200 sec to eliminate--
 which is not that long.)


No, it was something else entirely and not something that ntpd, chrony 
or any other application could do anything about. It's fixed now.

Danny
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread Bill Unruh
What was the problem?

On Mon, 28 Jan 2008, Danny Mayer wrote:

 Unruh wrote:
  [EMAIL PROTECTED] (Danny Mayer) writes:
 
   David L. Mills wrote:
Danny,
   
It doesn't stop working; it just clamps whatever it gets to +-500 PPM 
as appropriate. If the intrinsic error is greater than 500 PPM, the 
loop will do what it can with the residual it can't correct showing as 
a systematic time ofset.
   
Dave
  
 
   I didn't mean to suggest that ntpd stopped running. It was that the 
   clock was drifting steadily off into the sunset. I realize that if the 
   problem corrected itself ntpd would bring things back to normal.

  But that suggests that the drift rate of your chip became bigger than
  500PPM, which is huge. Maybe something altered the tick size
  inappropriately. ntp should have hauled the offset back to zero -- just
  taking a longer time ( 100msec at 500PPM takes about 200 sec to
  eliminate--
  which is not that long.)
 

 No, it was something else entirely and not something that ntpd, chrony or any 
 other application could do anything about. It's fixed now.

 Danny


-- 
William G. Unruh   |  Canadian Institute for| Tel: +1(604)822-3273
PhysicsAstronomy  | Advanced Research  | Fax: +1(604)822-5324
UBC, Vancouver,BC  |   Program in Cosmology | [EMAIL PROTECTED]
Canada V6T 1Z1 |  and Gravity   |  www.theory.physics.ubc.ca/
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]

2008-01-28 Thread Serge Bets
Hello Dean and Hal,

 On Tuesday, January 22, 2008 at 1:08:00 +, Dean S. Messing wrote:

 hal-usenet wrote:
 try changing the code that reads the CMOS clock to spin in a loop
 reading it until it changes.  That will give you the time early in
 the second.

The adjtimex code is already designed to detect the exact beginning of
an RTC second. Either via the /dev/rtc update-ended interrupt, or by
busywaiting for the fall of the update-in-progress (UIP) flag. But
nevertheless your analysis of facts seems good, Hal: This tick
synchronisation probably fails for some unknown reason in Dean's case.


 I just replaced version 1.23 of adjtimex with an old version 1.20 and
 the quirky behaviour disappeared.  I first noticed it on my new
 Fedora 7 with version 1.21.

Interesting: adjtimex 1.21 was the first version using by default the
/dev/rtc interrupt to detect the clock beat. The problem might be there.
Adjtimex 1.23 has an option to force the UIP method: does it show the
quirky offsets?

| # adjtimex --utc --compare=20 --interval=10 --directisa

Anyway the default /dev/rtc method is preferable. The 1.23 debug output
may reveal what's up with your interrupts:

| # adjtimex --utc --compare=1 --verbose


Serge.
-- 
Serge point Bets arobase laposte point net

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread David L. Mills
David,

We can argue about the Hurst parameter, which can't be truly random-walk 
as I have assumed, but the approximation is valid up to lag times of at 
least a week. However, as I have been cautioned, these plots are really 
sensitive to spectral lines due to nonuniform sampling. I was very 
careful to avoid such things.

Dave

David Malone wrote:

 Unruh [EMAIL PROTECTED] writes:
 
 
weekends. Lots of power at 10^-5 Hz and harmonics, and .7 10^-8Hz.-- more
than would be predicted by 1/f
 
 
 10^-5Hz is about once per day. I'm not sure what .7 10^8Hz is - it
 seems to be about once every 4.5 years? I would have assumed you'd
 get power around 10^-5Hz (daily), 10^-6 Hz (weekly) and maybe 3x10^-8
 (yearly) based on a mix of enviromental factors (air conditioning/heating)
 and usage?
 
   David.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Eric
On Sun, 20 Jan 2008 17:50:41 GMT, Unruh [EMAIL PROTECTED] wrote
for the entire planet to see:

[EMAIL PROTECTED] (David Woolley) writes:

In article [EMAIL PROTECTED],
Unruh [EMAIL PROTECTED] wrote:
snip
 I would assume that ntp is giving these samples with long round trip very 
 low weight, or even
 eliminating them.

Note: if these spikes are positive, they may be the result of lost ticks.

Don't think so. I think they are 5-10ms transmission delays. The delays 
disappear if I run at
maxpoll 7 rather than 10, so I suspect the router is forgetting the
addresses and taking its own sweet time about finding them if the time
between transmissions is many minutes.
chrony has a nice feature of being able to send an
echo datagram to the other machine if you want (before the ntp packet), to
 wake up the routers along the way. 

There are several related effects here that I have experienced in my NTP
network.  

First is the possible ARP resolution overheads.  If the IP addresses of
your host and of the destination or default gateway are not passing traffic
frequently the ARP cache in your host or the local router can time out and
need to be reloaded on each poll.  These can be on the order of 5-10ms and
will affect only one side of the transaction's transmission delay. 

Unfortunately ARP often uses a 15 minute TTL, and default NTP uses a 17
minute poll interval.  

Then there is the whole problem that many routers all along the path
experience extra overhead on the first packet of a flow.  Route table
look ups are done by destination IP of course, but generally have to be
installed into the cache, or FIB, the first time a new source/dest IP pair
shows up.  This is often a 1-3ms overhead.  And that entry doesn't last
forever either.

Then there is the MAC cache in your switches, which generally purge after
1-5 minutes.  This can often be adjusted higher, but that can sometimes
cause issues for others when they are reconfiguring part of the network.

Another issue is NATing or statefull firewalls.  There is often outbound
(or inbound) connection setup time.  Without special configuration this
often times out before twenty minutes, leading to more asymmetric delay.

I think the suggestion of a pre-poll ICMP echo is kinda interesting.  It
might be possible to limit the packet TTL to five hops or so, just warming
up your side of the network.  It might also be better to make it a mostly
standard UDP NTP packet so it matches whatever rules the intermediate
devices are applying (and you want them to remember).  QoS and policy
routing are both sensitive to port numbers, and certainly most firewalls
are protocol sensitive, so matching the initial packet attributes to the
desired high-performance packet attributes would probably help this
technique work.

To mitigate some of these effects it might not have to be done that often.
In many hierarchical network topologies it might serve just to send one
extra packet every 3-5 minutes using the same source IP/port that NTP
normally uses, to any configured server.  And it could still have a limited
TTL if desired.  That would at least keep the switch and ARP caches fresh
and depending on the design, the policy and NAT caches as well.  

- Eric
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] strange behaviour of ntp peerstats entries.

2008-01-28 Thread Brian Utterback
Unruh wrote:

 I am also a little bit surprized that it is the delay that is used and not
 the total roundtrip time. As I seem to read it, the delay is (t4-t3+t2-t1)
 ie, it does not take into account the delay within the far machinei (eg
 t4-t1), but
 only propagation delay. I would expect that the former might even be more
 important than the latter, but that is a pure guess-- ie no measurements on
 even one system to back it up. 
 Now it may be that on that rocky road to Manila, the propagation delay is
 by far the most important, but on a moderm lan, especially with a low
 propagation delay of hundreds of usec rather then 100s of msec, I wonder. 
 

The calculation of the offset is not affected at all by the time between
taking the two timestamps on the remote machine. In fact, on symmetric
peers, this time could be in the thousands of seconds or more.

Brian Utterback

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Brian Utterback
Rick Jones wrote:
 Eric [EMAIL PROTECTED] wrote:
 Then there is the MAC cache in your switches, which generally purge
 after 1-5 minutes.  This can often be adjusted higher, but that can
 sometimes cause issues for others when they are reconfiguring part
 of the network.
 
 I suppose if STP gets involved, but just on its own, the forwarding
 table in a switch being aged should only mean that the next frame to
 that MAC will go out all (enabled) ports on the switch until that MAC
 is seen again as a source.  That shouldn't affect timing really.
 
 rick jones

There is some blurring of device type going on. The problem is not
which port to send to, but rather which MAC address to send to. This
is more of a problem with routers than switches, but with VLANS and
whatever these days, the device uin question might be both.

In any case, if the needed MAC is not available, there has to be an
ARP request and response before the packet can be sent, but this
delay is not evident in the return trip for the NTP response packet,
introducing an asymmetric delay, the worst thing that happen to
NTP.

I reported this problem many years ago and suggested using burst
at that time, but thought that it would be overkill and asked
for a way to tune it to a fewer number of packets in the burst.
Dave was reticent and I was newer to the project then and didn't
want to push it. Perhaps it is time.

Brian Utterback

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] NTP Statistics

2008-01-28 Thread Steve Pearson
Hi,
I am using ntpv4.2.0 and have a question about the system statistics
interpretation on my NTP server.

The ntp page on monitoring options lists 11 'system stat' fields as
follows.
http://www.eecis.udel.edu/~mills/ntp/html/monopt.html

MJD date
time past midnight
time since restart
packets received last hour
server packets received last hour
current version packets last hour
previous version packets last hour
access denied packets last hour
bad length or format packets last hour
bad authentication packets last hour
rate exceeded packets last hour

A sample of my daily filegen output is as follows (12 fields):
54493 514.622 117 12 3 12 0 0 0 0 0 0
54493 4118.319 118 14 2 14 0 0 0 0 0 0
54493 7722.012 119 13 3 13 0 0 0 0 0 0

It is critical for me to understand the meaning of each of the stats
to properly monitor my NTP deployment. Can someone please point me to
more detailed descriptions or maybe just confirm and comment on my
guesses below.

My interpretation of the stats meaning (for my 1st line output listed
above) is as below:

54493 - MJD date
514.622 - UTC time past midnight in seconds
117 - time since restart (in hours? or is this just a record count?)
12 - packets received last hour (NTP req packets from clients last
hour)
3 - Server Packets received last hour (packets from other servers??)
12 - current version packets last hour (NTP req packets from clients
using same version of NTP)
0 - previous version packets last hour
0 - access denied packets last hour (not allowed to synchronize with
me???)
0 - bad length or format packets last hour
0 - bad authentication packets last hour (bad MD5 check??)
0 - rate exceeded packets last hour (exceeded the min poll rate or
some such?)
0 - extra field not described on web

Most critical for me is that I understand that packets received last
hour is really a count of NTP requests from clients. I want to use
this to get a rough idea of the load on my server to use for scaling
and monitoring.

thanks,
Steve

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Rick Jones
Eric [EMAIL PROTECTED] wrote:
 Then there is the MAC cache in your switches, which generally purge
 after 1-5 minutes.  This can often be adjusted higher, but that can
 sometimes cause issues for others when they are reconfiguring part
 of the network.

I suppose if STP gets involved, but just on its own, the forwarding
table in a switch being aged should only mean that the next frame to
that MAC will go out all (enabled) ports on the switch until that MAC
is seen again as a source.  That shouldn't affect timing really.

rick jones
-- 
a wide gulf separates what if from if only
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Harlan Stenn
Dave,

The problem Eric describes would seem to be most evident for LAN clients,
and on networks where there is already lots of traffic.

Recommending 'burst' for this case seems to me (without any experimental
evidence or even a read thru the code) to be counter-productive.

If a burstsize of 2 would be sufficient to address Eric's problem, I'd be
game for adding a 'shortburst' flage to handle that.  My thought is that
while a 'burstsize N' option would be more flexible, it could be abused too
easily.

H

 In article [EMAIL PROTECTED], David L. Mills [EMAIL PROTECTED] writes:

David Eric, Many years ago the Proteon routers dropped the first packet
David after the cache timed out; that was a disaster. That case and the
David ones you describe are exactly what the NTP burst mode is designed
David for. The first packet in the burst carves the caches all along the
David route and back. The clock filter algorithm tosses it out in favor of
David the remaining packets in the burst. No ICMP is needed or wanted.

David Dave

David Eric wrote:
 On Sun, 20 Jan 2008 17:50:41 GMT, Unruh [EMAIL PROTECTED] wrote
 for the entire planet to see:
 
 [EMAIL PROTECTED] (David Woolley) writes:
 
 
 In article [EMAIL PROTECTED],
 Unruh [EMAIL PROTECTED] wrote:
 snip
 
 I would assume that ntp is giving these samples with long round trip
 very low weight, or even eliminating them.

 Note: if these spikes are positive, they may be the result of lost
 ticks.
  Don't think so. I think they are 5-10ms transmission delays. The delays
 disappear if I run at maxpoll 7 rather than 10, so I suspect the router
 is forgetting the addresses and taking its own sweet time about finding
 them if the time between transmissions is many minutes.  chrony has a
 nice feature of being able to send an echo datagram to the other machine
 if you want (before the ntp packet), to wake up the routers along the
 way.
 There are several related effects here that I have experienced in my NTP
 network.  First is the possible ARP resolution overheads.  If the IP
 addresses of your host and of the destination or default gateway are not
 passing traffic frequently the ARP cache in your host or the local router
 can time out and need to be reloaded on each poll.  These can be on the
 order of 5-10ms and will affect only one side of the transaction's
 transmission delay. Unfortunately ARP often uses a 15 minute TTL, and
 default NTP uses a 17 minute poll interval.  Then there is the whole
 problem that many routers all along the path experience extra overhead on
 the first packet of a flow.  Route table look ups are done by
 destination IP of course, but generally have to be installed into the
 cache, or FIB, the first time a new source/dest IP pair shows up.  This
 is often a 1-3ms overhead.  And that entry doesn't last forever either.
 Then there is the MAC cache in your switches, which generally purge after
 1-5 minutes.  This can often be adjusted higher, but that can sometimes
 cause issues for others when they are reconfiguring part of the network.
 Another issue is NATing or statefull firewalls.  There is often outbound
 (or inbound) connection setup time.  Without special configuration this
 often times out before twenty minutes, leading to more asymmetric
 delay.  I think the suggestion of a pre-poll ICMP echo is kinda
 interesting.  It might be possible to limit the packet TTL to five hops
 or so, just warming up your side of the network.  It might also be
 better to make it a mostly standard UDP NTP packet so it matches whatever
 rules the intermediate devices are applying (and you want them to
 remember).  QoS and policy routing are both sensitive to port numbers,
 and certainly most firewalls are protocol sensitive, so matching the
 initial packet attributes to the desired high-performance packet
 attributes would probably help this technique work.  To mitigate some of
 these effects it might not have to be done that often.  In many
 hierarchical network topologies it might serve just to send one extra
 packet every 3-5 minutes using the same source IP/port that NTP normally
 uses, to any configured server.  And it could still have a limited TTL if
 desired.  That would at least keep the switch and ARP caches fresh and
 depending on the design, the policy and NAT caches as well.  - Eric
 

-- 
Harlan Stenn [EMAIL PROTECTED]
http://ntpforum.isc.org  - be a member!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP vs chrony comparison (Was: oscillations in ntp clock synchronization)

2008-01-28 Thread David L. Mills
Maarten,

Maybe I didn't make myself clear. The case in question is when the 
intrinsic frequency error of the computer clock is greater than 500 PPM, 
in which case the discipline loop cannot compensate for the error. The 
result is a systematic time offset error that cannot be driven to zero. 
This has nothing to do with the initial offset as you suggest.

Dave

Maarten Wiltink wrote:
 Unruh [EMAIL PROTECTED] wrote in message
 news:[EMAIL PROTECTED]
 
David L. Mills wrote:
 
 
Unless the computer clock intrinsic frequency error is huge, the
only time the 500-PPM kicks in is with a 100-ms step transient and
poll interval 16 s. The loop still works if it hits the stops; it
just can't drive the offset to zero.
 
 [...]
 
Why can't it drive the offset to zero? 100ms should take about 5 min(if
it were always 500 but the loop would make it take longer)
 
 
 That would presumably be in the case of 'huge intrinsic frequency error'.
 
 Groetjes,
 Maarten Wiltink
 
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Rick Jones
Eric [EMAIL PROTECTED] wrote:
 You are probably right about the MAC cache miss not affecting
 timing.  You are the resident switch guru here.

Scary thought :)

 Of course, different manufacturers may have different methods of
 detecting the cache miss and recovering from that, so it would be
 hard to eliminate that effect from consideration entirely.  It's the
 smallest effect of all the ones I've dealt with.

Just to be certain, you are talking about MAC's being aged out of a
switch's forwarding tables right?  I interpreted it that way based on
the previous text discussing ARP caches.

rick jones
-- 
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is Can it be patched?
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Steve Kostecke
On 2008-01-28, David L. Mills [EMAIL PROTECTED] wrote:

 Eric wrote:

 [---=| TOFU protection by t-prot: 72 lines snipped |=---]

 That case and the ones you describe are exactly what the NTP burst
 mode is designed for. The first packet in the burst carves the caches
 all along the route and back. The clock filter algorithm tosses it out
 in favor of the remaining packets in the burst. No ICMP is needed or
 wanted.

Burst sends 8x packets to the remote time server at each poll interval.
This greatly increases the load posed any one client.

Perhaps it may be useful to allow the user to specify a smaller number
of packets.

-- 
Steve Kostecke [EMAIL PROTECTED]
NTP Public Services Project - http://support.ntp.org/

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Eric
On Mon, 28 Jan 2008 19:19:12 +, David L. Mills [EMAIL PROTECTED] wrote
for the entire planet to see:

Eric,

Many years ago the Proteon routers dropped the first packet after the 
cache timed out; that was a disaster. That case and the ones you 
describe are exactly what the NTP burst mode is designed for. The first 
packet in the burst carves the caches all along the route and back. The 
clock filter algorithm tosses it out in favor of the remaining packets 
in the burst. No ICMP is needed or wanted.

Dave

I agree about ICMP.  UDP would be better.

And BURST / IBURST are nice, but conventional wisdom has it that BURST
really shouldn't be used towards servers that you don't administer, and
IBURST will of course not handle the ongoing case.  

In considering this more, I think a great option or tinker value would be
one that simply sends an extra packet, rather than eight of them, and only
if the previous poll for that association was sent more than x seconds ago.
In other words, as long as the poll value is say 7 or less, nothing new is
needed.  When the poll exceeds 7, then ten seconds before a poll is due to
be sent an explorer poll is sent (and any response would likely be
discarded).  EBURST, or maybe PAVE.

- Eric


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Rick Jones
Brian Utterback [EMAIL PROTECTED] wrote:
 Rick Jones wrote:
  Eric [EMAIL PROTECTED] wrote:
  Then there is the MAC cache in your switches, which generally
  purge after 1-5 minutes.  This can often be adjusted higher, but
  that can sometimes cause issues for others when they are
  reconfiguring part of the network.
  
  I suppose if STP gets involved, but just on its own, the
  forwarding table in a switch being aged should only mean that the
  next frame to that MAC will go out all (enabled) ports on the
  switch until that MAC is seen again as a source.  That shouldn't
  affect timing really.
  
  rick jones

 There is some blurring of device type going on. The problem is not
 which port to send to, but rather which MAC address to send to. This
 is more of a problem with routers than switches, but with VLANS and
 whatever these days, the device uin question might be both.

I interpreted Eric's text differently.  Since a device acting as a
switch is only operating at layer 2, it doesn't do any lookups on
what the destination MAC should be.  Indeed, a device operating as a
router could be doing an ARP lookup, but I ass-u-me-d that was
covered by a prior paragraph of Eric's.

 In any case, if the needed MAC is not available, there has to be an
 ARP request and response before the packet can be sent, but this
 delay is not evident in the return trip for the NTP response packet,
 introducing an asymmetric delay, the worst thing that happen to
 NTP.

 I reported this problem many years ago and suggested using burst at
 that time, but thought that it would be overkill and asked for a way
 to tune it to a fewer number of packets in the burst.  Dave was
 reticent and I was newer to the project then and didn't want to push
 it. Perhaps it is time.

I'll probably quite easily display my profound NTP ignorance here :)
But if there is assymetric delay stemming from an ARP resolution,
won't it affect all the packets in the burst?  Unless the tranmission
time of the burst getting out of NTP is  the ARP resolution time,
the entire burst is going to be blocked waiting on the ARP resolution.

Now, if this burst was really send a couple; wait for a reply; send a
couple more then one might ass-u-me (I do love that spelling :) that
the couple more didn't have ARP-induced assymetry.

rick jones
I probably tweak on switch vs router much the same way an NTP
person tweaks on accuracy vs precision :)
-- 
web2.0 n, the dot.com reunion tour...
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Eric
On Mon, 28 Jan 2008 21:39:17 + (UTC), Rick Jones [EMAIL PROTECTED]
wrote for the entire planet to see:

Eric [EMAIL PROTECTED] wrote:

 Of course, different manufacturers may have different methods of
 detecting the cache miss and recovering from that, so it would be
 hard to eliminate that effect from consideration entirely.  It's the
 smallest effect of all the ones I've dealt with.

Just to be certain, you are talking about MAC's being aged out of a
switch's forwarding tables right?  I interpreted it that way based on
the previous text discussing ARP caches.

Yup.  And I see that in the simple case the packet just floods, isn't
delayed on its original path/port, and the MAC cache update is handled
overlapped in time with the packet transfer.  

But, there may be more complicated cases; you mentioned STP, and of course
flooding causes its own delays to some degree.  Then it might be that the
switch firmware takes a slow path on the cache miss, causes an interrupt,
gets scheduled into a timeslice, updates the MAC Cache, and then redrives
the packet forwarding process.  Not ideal, if that ever happens.

- Eric

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread David Woolley
Eric wrote:

 
 I'm pleased to know I've provoked some new thoughts.  If I understand your
 post, burst mode was intended to get enough (lousy) samples into and
 through the clock filters to allow for initial sync.  Once the pipeline is
 loaded no more extra polls are needed.  
 
That's iburst, not burst.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread David L. Mills
Eric,

Good suggestion. In either burst mode the code sends a single packet at 
each poll interval, but sends the remaining packets only after receiving 
a response; packets in the burst are sent at 2-s intervals. The most 
cautious can set the headway to 512 s, in which a single packet is sent 
at that interval or less, but two packets at 1024 only upon reeiving 
aresponse for the first.

Burst mode is of course not intended for busy servers, much less the 
national standards servers. It is inended for paths involving lossy, low 
speed nets with poll intervals 1024 s or more. The young folks among us 
might not remember (or even be alive) when the Internet was new and 
paths to Canada had delays up to several seconds and loss rates up to 
ten percent.

However, you give me an idea. Why not shut down the burst when the clock 
filter delivers the first sample? Gotta think about that.

Dave

Eric wrote:

 On Mon, 28 Jan 2008 19:19:12 +, David L. Mills [EMAIL PROTECTED] wrote
 for the entire planet to see:
 
 
Eric,

Many years ago the Proteon routers dropped the first packet after the 
cache timed out; that was a disaster. That case and the ones you 
describe are exactly what the NTP burst mode is designed for. The first 
packet in the burst carves the caches all along the route and back. The 
clock filter algorithm tosses it out in favor of the remaining packets 
in the burst. No ICMP is needed or wanted.

Dave
 
 
 I agree about ICMP.  UDP would be better.
 
 And BURST / IBURST are nice, but conventional wisdom has it that BURST
 really shouldn't be used towards servers that you don't administer, and
 IBURST will of course not handle the ongoing case.  
 
 In considering this more, I think a great option or tinker value would be
 one that simply sends an extra packet, rather than eight of them, and only
 if the previous poll for that association was sent more than x seconds ago.
 In other words, as long as the poll value is say 7 or less, nothing new is
 needed.  When the poll exceeds 7, then ten seconds before a poll is due to
 be sent an explorer poll is sent (and any response would likely be
 discarded).  EBURST, or maybe PAVE.
 
 - Eric
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Rick Jones
 The burst is sent at 1 second intervals.  There should be lots of
 time for all the switches and routers to get their act in gear.

Ah, well chalk that one up to me being picky (perhaps even wrong :)
about network terminology then :) I always think of a burst as a
series of packets sent back-to-back.

insert suitable Emily Litella quote here

rick jones
-- 
No need to believe in either side, or any side. There is no cause.
There's only yourself. The belief is in your own precision.  - Jobert
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP Statistics

2008-01-28 Thread Harlan Stenn
 In article [EMAIL PROTECTED], Steve Pearson [EMAIL PROTECTED] writes:

Steve Hi, I am using ntpv4.2.0 and have a question about the system
Steve statistics interpretation on my NTP server.

Steve The ntp page on monitoring options lists 11 'system stat' fields as
Steve follows.  http://www.eecis.udel.edu/~mills/ntp/html/monopt.html

Those are Dave's pages, and they reflect the latest -dev code.

For now, the best thing for you to do is get the source code for the version
you are running and look at the code.

The NTP Forum has projects listed to improve the documentation:

 http://ntpforum.isc.org/Main/ForumProject3
 http://ntpforum.isc.org/Main/ForumProject4

Another thing the NTP Forum will be helping support.ntp.org have
web-searchable documentation for different versions of NTP.

A significant purpose of the NTP Forum is to find the places where NTP is
giving you headaches and then making those headaches go away.

I'm eager to work with folks to get their companies signed up as
institutional members in the NTP Forum so we can get rid of the pain and
have significantly better lives where NTP is concerned.
-- 
Harlan Stenn [EMAIL PROTECTED]
http://ntpforum.isc.org  - be a member!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Richard B. Gilbert
Hal Murray wrote:
I'll probably quite easily display my profound NTP ignorance here :)
But if there is assymetric delay stemming from an ARP resolution,
won't it affect all the packets in the burst?  Unless the tranmission
time of the burst getting out of NTP is  the ARP resolution time,
the entire burst is going to be blocked waiting on the ARP resolution.
 
 
 The burst is sent at 1 second intervals.  There should be lots of
 time for all the switches and routers to get their act in gear.
 

I thought that burst sent eight packets two seconds apart at each poll 
interval.  It's not apprropriate for most situations.  It was designed 
for systems making infrequent dialup connections like twice or three 
times daily.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Eric

On Mon, 28 Jan 2008 22:09:11 +, David L. Mills [EMAIL PROTECTED] wrote
for the entire planet to see:
snip

However, you give me an idea. Why not shut down the burst when the clock 
filter delivers the first sample? Gotta think about that.

Dave

Hi Dave - 

I'm pleased to know I've provoked some new thoughts.  If I understand your
post, burst mode was intended to get enough (lousy) samples into and
through the clock filters to allow for initial sync.  Once the pipeline is
loaded no more extra polls are needed.  

But the rest of this sub-thread was about poll intervals that get so large
that the intervening equipment forgets about the flow and always, from then
on, gives lousy performance on the one and only poll in that interval.  

I guess we could kill two birds with one stone and shut down burst as you
suggest, until the interval gets longer, when it could make a reappearance,
perhaps as only a pair of packets.

- Eric

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Richard B. Gilbert
Eric wrote:
 On Mon, 28 Jan 2008 22:09:11 +, David L. Mills [EMAIL PROTECTED] wrote
 for the entire planet to see:
 snip
 
However, you give me an idea. Why not shut down the burst when the clock 
filter delivers the first sample? Gotta think about that.

Dave
 
 
 Hi Dave - 
 
 I'm pleased to know I've provoked some new thoughts.  If I understand your
 post, burst mode was intended to get enough (lousy) samples into and
 through the clock filters to allow for initial sync.  Once the pipeline is
 loaded no more extra polls are needed.  

I think you are confusing iburst and burst.   Iburst is used at 
startup to fill the pipeline and get a fast startup.  Following that 
initial burst, a single request packet is sent at each poll interval.

Burst mode is used in situations where a system connects to a server at 
intervals measured in hours; e.g. two to four times per day.  The 
samples are not necessarily lousy, they are just obtained infrequently.
Eight samples fill the pipeline and satisfy the filter.




___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] strange behaviour of ntp peerstats entries.

2008-01-28 Thread David L. Mills
Unruh,

It would seem self evident from the equations that minimizing the delay 
variance truly does minimize the offset variance. Further evidence of 
that is in the raw versus filtered offset graphs in the architecture 
briefings. If nothing else, the filter reduces the variance by some 10 
dB. More to the point, emphasis added, the wedge scattergrams show just 
how good the filter can be. It selects points near the apex of the 
wedge, the others don't matter. You might argue the particular clock 
filter algorithm could be improved, but the mission in any case is to 
select the points at or near the apex.

While the authors might not have realized it, the filter method you 
describe is identical to Cristian's Probabilistic Clock Synchronization 
(PCS) methiod described in the literature some years back. The idea is 
to discard the outlyer delays beyond a decreasing threshold. In other 
words, the tighter the threshold, the more outlyers are tossed out, so 
you strike a balance. I argued then and now that it is better to select 
the best from among the samples rather than to selectively discard the 
outlyers.

There may be merit in an arugment that says the points along the limbs 
of the wedge are being ignored. In principle, these points can be found 
using a slective filter that searches for an offset/delay ration of 0.5, 
which in fact is what the huff-n'-puff filter does. To do this 
effectively you need to know the baseline propagation delay, which is 
also what the huff-n'-puff filter does. Experiments doing this with 
symmetric delays, as agains the asymmetric delays the huff-n'-puff 
filter was designed for were inconclusive.

Dave

Unruh wrote:

snip

 Oh yes. popcorn suppression is important. I agree. But the filter goes well
 beyond that. My eaction is that on the one hand people keep saying how
 important net load is, and that one does not want to use poll intervals
 that are much smaller than 8 or 10, and on the other hand, throwing away
 80-90% of the data collected. Remin ds me of the story of Saul, king of the
 Israelites, whose army was besieged, and he mentioned that he was thirsty.
 A few of his soldiers risked everything to get through the enemy lines and
 bring him water. He was so impressed that he poured it all out on the
 ground, in tribute to their courage. I have always found that story an
 incredible insult to the bravery instead.
 
 The procedure does drastically reduce the variance of the delay, but does
 not much for the variance of the offset, which is of coure what is
 important. Just to bring up chrony again, it uses both a suppression where
 round trips greater than say 1.5 of min are discarded, and data is weighted
 by some power of the invere of the delay.

snip

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread David L. Mills
Steve,

You might have missed my message about rate control on the hackers list. 
The average headway for all packets, including those in a burst is 
strictly controlled at 16 s. So, 1 packet in a burst at 16 s, 2 at 32 
s, 4 at 64 and 8 at 128 s and higher. The default average headway can be 
set by a configuration command.

The scheme is specifically designed for long, noisy Internet paths and 
large poll intervals where the clock filter is most effective and also 
for cases involving dialup links with highly variable call setup delays. 
The fact that it trounces ARP caches is a secondary benefit.

ICMP pings will not work to our campus machines from outside. ICMP 
request messages are dropped by the ingress router.

Dave

Steve Kostecke wrote:
 On 2008-01-28, David L. Mills [EMAIL PROTECTED] wrote:
 
 
Eric wrote:


[---=| TOFU protection by t-prot: 72 lines snipped |=---]

That case and the ones you describe are exactly what the NTP burst
mode is designed for. The first packet in the burst carves the caches
all along the route and back. The clock filter algorithm tosses it out
in favor of the remaining packets in the burst. No ICMP is needed or
wanted.
 
 
 Burst sends 8x packets to the remote time server at each poll interval.
 This greatly increases the load posed any one client.
 
 Perhaps it may be useful to allow the user to specify a smaller number
 of packets.
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread Eric
On Mon, 28 Jan 2008 17:44:08 -0500, Richard B. Gilbert
[EMAIL PROTECTED] wrote for the entire planet to see:

I thought that burst sent eight packets two seconds apart at each poll 
interval.  It's not apprropriate for most situations.  It was designed 
for systems making infrequent dialup connections like twice or three 
times daily.

My confusion.  IBURST for the Initial loading of the buffer.  BURST for the
very, very infrequent connection to reload the entire buffer each time.

But what about the idea that IBURST is nice for fast startup, BURST is
helpful if there hasn't been a poll for a very, very long time, and now the
new idea for an explorer packet (only one extra) that would be nice to
smooth the network path when the polling interval goes over a couple of
minutes.  

It turns each of them into virtually the same case, classified by when the
polling interval currently in effect.

- Eric


 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread David L. Mills
Eric,

There are actually two burst modes IBURST when the server is unreachable 
and BURST when it is. They are independent of each other and both can be 
used at the same time. Currently, IBURST uses 6 packets, as that is a 
couple more than to pass the distance threshold and synchronize the 
clock. This actually is recommended; the following packets whether burst 
or not are delayed so that the average headway does not exceed the 
specified threshold, by default 16 s. The BURST mode also obeys the 
headway restrictions, but is intended to de-jitter in the cases I mentioned.

What set off my bell in response to your remark was an interesting 
observation when watching the clock filter operate. Start the daemon 
with a -d flag and watch the clock_filter and local_clock traces. Notice 
that there are often several samples discarded as not younger than the 
last used sample. This is a normal situation; however, it reveals that 
the probability of using another sample just after usin one is 
relatively low. In other words, when you find a sample you might as well 
give up and wait for the next burst. This needs to be confirmed.

Dave

Eric wrote:

 On Mon, 28 Jan 2008 22:09:11 +, David L. Mills [EMAIL PROTECTED] wrote
 for the entire planet to see:
 snip
 
However, you give me an idea. Why not shut down the burst when the clock 
filter delivers the first sample? Gotta think about that.

Dave
 
 
 Hi Dave - 
 
 I'm pleased to know I've provoked some new thoughts.  If I understand your
 post, burst mode was intended to get enough (lousy) samples into and
 through the clock filters to allow for initial sync.  Once the pipeline is
 loaded no more extra polls are needed.  
 
 But the rest of this sub-thread was about poll intervals that get so large
 that the intervening equipment forgets about the flow and always, from then
 on, gives lousy performance on the one and only poll in that interval.  
 
 I guess we could kill two birds with one stone and shut down burst as you
 suggest, until the interval gets longer, when it could make a reappearance,
 perhaps as only a pair of packets.
 
 - Eric
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] NTP Statistics

2008-01-28 Thread David L. Mills
Steve,

The best place to check the data is in the ntp_util.c file, 
record_sys_stats() routine. I recently added another stat, but you might 
not be using the most recent version.

The time since startup is in hours. The packets received are the total 
number of packets received. The server packets received are in response 
to packets sent from an association on your machine.

There are so many little trails in and out of the machine, like 
control/monitor packets, etc., and so many little ways a packet can be 
dropped, that the counters might not catch each and every wee thing.

Dave

Steve Pearson wrote:

 Hi,
 I am using ntpv4.2.0 and have a question about the system statistics
 interpretation on my NTP server.
 
 The ntp page on monitoring options lists 11 'system stat' fields as
 follows.
 http://www.eecis.udel.edu/~mills/ntp/html/monopt.html
 
 MJD date
 time past midnight
 time since restart
 packets received last hour
 server packets received last hour
 current version packets last hour
 previous version packets last hour
 access denied packets last hour
 bad length or format packets last hour
 bad authentication packets last hour
 rate exceeded packets last hour
 
 A sample of my daily filegen output is as follows (12 fields):
 54493 514.622 117 12 3 12 0 0 0 0 0 0
 54493 4118.319 118 14 2 14 0 0 0 0 0 0
 54493 7722.012 119 13 3 13 0 0 0 0 0 0
 
 It is critical for me to understand the meaning of each of the stats
 to properly monitor my NTP deployment. Can someone please point me to
 more detailed descriptions or maybe just confirm and comment on my
 guesses below.
 
 My interpretation of the stats meaning (for my 1st line output listed
 above) is as below:
 
 54493 - MJD date
 514.622 - UTC time past midnight in seconds
 117 - time since restart (in hours? or is this just a record count?)
 12 - packets received last hour (NTP req packets from clients last
 hour)
 3 - Server Packets received last hour (packets from other servers??)
 12 - current version packets last hour (NTP req packets from clients
 using same version of NTP)
 0 - previous version packets last hour
 0 - access denied packets last hour (not allowed to synchronize with
 me???)
 0 - bad length or format packets last hour
 0 - bad authentication packets last hour (bad MD5 check??)
 0 - rate exceeded packets last hour (exceeded the min poll rate or
 some such?)
 0 - extra field not described on web
 
 Most critical for me is that I understand that packets received last
 hour is really a count of NTP requests from clients. I want to use
 this to get a rough idea of the load on my server to use for scaling
 and monitoring.
 
 thanks,
 Steve

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] very slow convergence of ntp to correct time.

2008-01-28 Thread David L. Mills
Hal,

Not any more. Current NTPv4 sends the burst at 2-s intervals, mainly to 
coordinate with Autokey opportunities and reduce the total number of 
packets.

Dave

Hal Murray wrote:

I'll probably quite easily display my profound NTP ignorance here :)
But if there is assymetric delay stemming from an ARP resolution,
won't it affect all the packets in the burst?  Unless the tranmission
time of the burst getting out of NTP is  the ARP resolution time,
the entire burst is going to be blocked waiting on the ARP resolution.
 
 
 The burst is sent at 1 second intervals.  There should be lots of
 time for all the switches and routers to get their act in gear.
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] strange behaviour of ntp peerstats entries.

2008-01-28 Thread Unruh
David L. Mills [EMAIL PROTECTED] writes:

Unruh,

It would seem self evident from the equations that minimizing the delay 
variance truly does minimize the offset variance. Further evidence of 
that is in the raw versus filtered offset graphs in the architecture 
briefings. If nothing else, the filter reduces the variance by some 10 
dB. More to the point, emphasis added, the wedge scattergrams show just 

I guess then I am confused because my data does not support that. While the
delay variance IS reduced, the offset variance is not. The correleation
between dely and offset IS reduced by a factor of 10, but the clock
variance is reduced not at all. 

Here are the results from one day gathered brom one clock (I had ntp not
only print out the peer-offset peer-delay as it does in the
record_peer_stats , but also the p_offset and p_del, the offset and delays
calculated for each packet. I alsy throw out the outliers ( for some reason
the system would all of a sudden have packets with were 4ms round trip,
rather than 160usec. These popcorn spikes are clearly bad. The difference
between the variance as calculated from the peer-offset values, and the
p_offset values was

.5995 (p_offset with del spikes greater than .0003 eliminated)
.6017 (peer-offset std dev )
.07337 (p_delay standard deviation, with the greater than .0003 spikes
removed)
.05489 (peer-delay std dev)

(Note that if those popcorn spikes had not been removed, the std dev of the
p_offset and p_delay would have been much larger).

Ie, it makes no difference at all to the offset std dev, but a significant
one to the delay, (Yes, the precision I quote the numbers at is far greater
than the accurasy)
This is throwing away 83% of the data in the peer- case. 

Note that this is one machine on one day, etc. and well after the startup
transients had disappeared.

 
how good the filter can be. It selects points near the apex of the 
wedge, the others don't matter. You might argue the particular clock 
filter algorithm could be improved, but the mission in any case is to 
select the points at or near the apex.


While the authors might not have realized it, the filter method you 
describe is identical to Cristian's Probabilistic Clock Synchronization 
(PCS) methiod described in the literature some years back. The idea is 

I have no idea if Curnoe knew that. The majority of his code was written 10
years ago, not recently. He uses only the inverse of the delay as the
weights I believe, with a user adjustable parameter to throw away delays
which are too large. 

to discard the outlyer delays beyond a decreasing threshold. In other 
words, the tighter the threshold, the more outlyers are tossed out, so 
you strike a balance. I argued then and now that it is better to select 
the best from among the samples rather than to selectively discard the 
outlyers.



There may be merit in an arugment that says the points along the limbs 
of the wedge are being ignored. In principle, these points can be found 
using a slective filter that searches for an offset/delay ration of 0.5, 
which in fact is what the huff-n'-puff filter does. To do this 
effectively you need to know the baseline propagation delay, which is 
also what the huff-n'-puff filter does. Experiments doing this with 
symmetric delays, as agains the asymmetric delays the huff-n'-puff 
filter was designed for were inconclusive.

But from what I see of the code, the huff-n-puff occurs after 80% have
already been discarded by the clock_filter

If data were cheap, ( and I think that in most cases today it is) then
throwing away 80% is fine. There is lots more out there. But this
profligacy in the treatment of the data sits uncomfortably with the
competing claim that collecting data is precious -- you should never use
maxpoll less than 7, you should bother the ntp servers as little as
possible. That makes the data precious. You cannot simply go out and
collect all you want. Then throwing it away seems a bad idea to me.


Dave

Unruh wrote:

snip

 Oh yes. popcorn suppression is important. I agree. But the filter goes well
 beyond that. My eaction is that on the one hand people keep saying how
 important net load is, and that one does not want to use poll intervals
 that are much smaller than 8 or 10, and on the other hand, throwing away
 80-90% of the data collected. Remin ds me of the story of Saul, king of the
 Israelites, whose army was besieged, and he mentioned that he was thirsty.
 A few of his soldiers risked everything to get through the enemy lines and
 bring him water. He was so impressed that he poured it all out on the
 ground, in tribute to their courage. I have always found that story an
 incredible insult to the bravery instead.
 
 The procedure does drastically reduce the variance of the delay, but does
 not much for the variance of the offset, which is of coure what is
 important. Just to bring up chrony again, it uses both a suppression where
 round trips greater than say 1.5