On 11.10.2012 09:30, John-Mark Gurney wrote:
Alexander Motin wrote this message on Thu, Oct 11, 2012 at 01:43 +0300:
On 08.10.2012 07:02, John-Mark Gurney wrote:
I recently put together a new machine w/ a SuperMicro H8SCM and an
AMD Opteron 4228 HE...  I've having an issue where the clock on the
machine skips around...  The wierd part is that it's very sudden when
it happens...  ntp sometimes brings it back, but it can't when the clock
gets too far ahread (1000 seconds), ntp dies...

In order to catch it happening, I ran a sleep 60 loop fetching time
>from another server that keeps time correctly via:
while sleep 60; do echo -n h2:; nc h2 13; date; ntpdate h2.funkthat.com;
done

here are some snippits:
h2:Sun Oct  7 17:12:54 2012^M
Sun Oct  7 17:12:54 PDT 2012
  7 Oct 17:12:54 ntpdate[31036]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:13:48 2012^M
Sun Oct  7 17:20:21 PDT 2012
  7 Oct 17:20:21 ntpdate[31045]: the NTP socket is in use, exiting

but then ntp brings it back in sync:
h2:Sun Oct  7 17:28:49 2012^M
Sun Oct  7 17:35:21 PDT 2012
  7 Oct 17:35:21 ntpdate[31164]: the NTP socket is in use, exiting
h2:Sun Oct  7 17:29:49 2012^M
Sun Oct  7 17:29:49 PDT 2012
  7 Oct 17:29:49 ntpdate[31170]: the NTP socket is in use, exiting

It happens pretty often:
Oct  7 00:19:13 gold ntpd[3721]: time reset -785.347912 s
Oct  7 00:46:37 gold ntpd[3721]: time reset -392.673256 s
Oct  7 01:04:24 gold ntpd[3721]: time reset -785.346533 s
Oct  7 15:00:59 gold ntpd[3721]: time reset -392.681720 s
Oct  7 16:32:11 gold ntpd[3721]: time reset -392.671268 s
Oct  7 17:29:29 gold ntpd[3721]: time reset -392.671752 s
Oct  7 18:04:37 gold ntpd[3721]: time reset -785.346987 s

but as you can see above, the time slip happens abruptly.. looks like
a rounding error or something...

I'm now reducing the sleep to 5 seconds... but as you can see the sleep
ends a few seconds early and local time suddenly jumped forward 6
minutes 33 seconds...

$ sysctl kern.timecounter
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) HPET(950) i8254(0)
dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 11598
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 3257069245
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-safe.mask: 16777215
kern.timecounter.tc.ACPI-safe.counter: 4219134510
kern.timecounter.tc.ACPI-safe.frequency: 3579545
kern.timecounter.tc.ACPI-safe.quality: 850
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.TSC-low.counter: 2854866610
kern.timecounter.tc.TSC-low.frequency: 10937740
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
$ sysctl kern.eventtimer
kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 100002217
kern.eventtimer.et.LAPIC.quality: 400
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2

I have switched my timecounter to HPET to see if things are different...

Any clues?

Mentioned switching to HPET could tell a lot about the problem.
Switching event timer also may be interesting.

Since I switch to HPET, it hasn't happened at all in the last 3 days..

That is probably tells about some problems with TSC timecounter. What is strange to me is time jump size of 5 minutes. TSC timecounter should overflow each few seconds, so single jump should be just that big.

Should I try switching back to TSC and switching event timer? do you
need any other info, or want me to try anything else?

You may try to do it to be sure eventtimers are not related to the case.

Oh, forgot to include the specific processor info in my previous
email:
CPU: AMD Opteron(tm) Processor 4228 HE               (2800.05-MHz K8-class CPU)
   Origin = "AuthenticAMD"  Id = 0x600f12  Family = 0x15  Model = 0x1  Stepping 
= 2
   
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
   
Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX>
   AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
   AMD 
Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,<b23>,<b24>>
   TSC: P-state invariant, performance statistics

Unfortunately, I don't know AMD processors specifics. May be jkim@ or avg@ may remember something. As far as I know, kernel should block enter sleep states on AMD CPUs when LAPIC eventtimer is used (by default). In such case I guess TSC should also work fine. But I don't know what other possible sources of asynchronicity may be there.

--
Alexander Motin
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to